Sketch Engine is a corpus manager and text analysis software developed by Lexical Computing since 2003. Its purpose is to enable people studying language behaviour ( lexicographers , researchers in corpus linguistics , translators or language learners ) to search large text collections according to complex and linguistically motivated queries. Sketch Engine gained its name after one of the key features, word sketches : one-page, automatic, corpus-derived summaries of a word's grammatical and collocational behaviour. Currently, it supports and provides corpora in over 90 languages.
13-461: HNK may refer to: Croatian National Corpus (Croatian: Hrvatski nacionalni korpus ) Croatian National Theatre (disambiguation) (Croatian: Hrvatsko narodno kazalište ) Fist of the North Star (Japanese: Hokuto no Ken ) a manga and its franchise Honokiol , a lignan Hydroxynorketamine , a metabolite of ketamine Land of
26-460: A particular corpus, document, or text. Single words and multi-word units can be extracted from monolingual or bilingual texts. The terminology extraction feature provides a list of relevant terms based on comparison with a large corpus of general language. This functionality is also available as a separate service called OneClick Terms with a dedicated interface. A free web service based on Sketch Engine and aimed at language learners and teachers
39-533: Is SKELL (formerly SkELL ). It exploits Sketch Engine's proprietary GDEX (Good Dictionary Examples) scoring function to provide authentic example sentences for specific target words. Results are drawn from a special corpus of high-quality texts covering everyday, standard, formal, and professional language and displayed as a concordance . SKELL also includes simplified versions of Sketch Engine's word sketch and thesaurus functions. It has been suggested that SKELL can be used, for instance, to help students understand
52-518: Is available via web-interface search Bonito 2 which is a part of NoSketch Engine, limited version of the software Sketch Engine . Sketch Engine Sketch Engine is a product of Lexical Computing, a company founded in 2003 by the lexicographer and research scientist Adam Kilgarriff . He started a collaboration with Pavel Rychlý, a computer scientist working at the Natural Language Processing Centre, Masaryk University , and
65-559: Is based on the idea of inverted indexing (keeping an index of all positions of a given word in the text). It has been used to index text corpora comprising tens of billions of words. Searching corpora indexed by Manatee is performed by formulating queries in the Corpus Query Language (CQL). Manatee is written in C++ and offers an API for a number of other programming languages including Python , Java , Perl and Ruby . Recently, it
78-527: Is different from Wikidata All article disambiguation pages All disambiguation pages Croatian National Corpus Croatian National Corpus ( Croatian : Hrvatski nacionalni korpus , HNK ) is the biggest and the most important corpus of Croatian . Its compilation started in 1998 at the Institute of Linguistics of the Faculty of Humanities and Social Sciences , University of Zagreb following
91-526: The TenTen Corpus Family (multi-billion web corpora), and Trends corpora (monitor corpora with daily updates). Sketch Engine consists of three main components: an underlying database management system called Manatee, a web interface search front-end called Bonito, and a web interface for corpus building and management called Corpus Architect. Manatee is a database management system specifically devised for effective indexing of large text corpora. It
104-851: The HNK (today still with free test access) a free client program Bonito is needed. The author of this corpus manager is Pavel Rychlý from the Natural Language Processing Laboratory of the Faculty of Informatics, Masaryk University in Brno, Czech Republic. Its interface features complex and more elaborated queries over corpus, different types of statistical results, total or partial word lists according to different query criteria (with their frequencies), frequency distribution of types, automatic collocation detection etc. The last version of this corpus (version 3) has 216.8 million tokens. The online search
117-631: The Lustrous (Japanese: Hōseki no Kuni ) a manga Topics referred to by the same term [REDACTED] This disambiguation page lists articles associated with the title HNK . If an internal link led you here, you may wish to change the link to point directly to the intended article. Retrieved from " https://en.wikipedia.org/w/index.php?title=HNK&oldid=1169302583 " Category : Disambiguation pages Hidden categories: Articles containing Croatian-language text Articles containing Japanese-language text Short description
130-688: The developer of Manatee and Bonito (two major parts of the software suite). Kilgarriff also introduced the concept of word sketches . Since then, Sketch Engine has been commercial software, however, all the core features of Manatee and Bonito that were developed by 2003 (and extended since then) are freely available under the GPL license within the NoSketch Engine suite. A list of tools available in Sketch Engine: Sketch Engine can perform automatic term extraction by identifying words typical of
143-490: The ideas of Marko Tadić . The theoretical foundations and the expression of the need for a general-purpose, representative and multi-million corpus of Croatian started to appear even earlier. The Croatian National Corpus is compiled from selected texts written in Croatian covering all fields, topics, genres and styles: from literary and scientific texts to text-books, newspaper, user-groups and chat rooms. The initial composition
SECTION 10
#1732856209819156-925: The meaning and/or usage of a word or phrase; to help teachers wanting to use example sentences in a class; to discover and explore collocates ; to create gap-fill exercises ; to teach various kinds of homonyms and polysemous words . SKELL was first presented in 2014, when only English was supported. Later, support was added for Russian , Czech , German , Italian and Estonian . Sketch Engine provides access to more than 700 text corpora. There are monolingual as well as multilingual corpora of different sizes (from thousand of words up to 60 billions of words) and various sources (e.g. web, books, subtitles, legal documents). The list of corpora includes British National Corpus , Brown Corpus , Cambridge Academic English Corpus and Cambridge Learner Corpus, CHILDES corpora of child language, OpenSubtitles (a set of 60 parallel corpora), 24 multilingual corpora of EUR-Lex documents,
169-480: Was divided in two constituents: Since 2004, with the adoption of the concept of the 3rd generation corpus, the two-constituent structure has been abandoned in favor of several subcorpora and larger size. Since 2005 HNK 105 million tokens and is composed of number of different subcorpora which can be searched individually and all together in a whole corpus. Since 2004 HNK also migrated to a new server platform, namely Manatee/Bonito server-client architecture. For searching
#818181