Head-driven phrase structure grammar

Head-driven phrase structure grammar ( HPSG ) is a highly lexicalized, constraint-based grammar developed by Carl Pollard and Ivan Sag . It is a type of phrase structure grammar , as opposed to a dependency grammar , and it is the immediate successor to generalized phrase structure grammar . HPSG draws from other fields such as computer science ( data type theory and knowledge representation ) and uses Ferdinand de Saussure 's notion of the sign . It uses a uniform formalism and is organized in a modular way which makes it attractive for natural language processing .

#144855

22-558: An HPSG includes principles and grammar rules and lexicon entries which are normally not considered to belong to a grammar. The formalism is based on lexicalism. This means that the lexicon is more than just a list of entries; it is in itself richly structured. Individual entries are marked with types. Types form a hierarchy. Early versions of the grammar were very lexicalized with few grammatical rules (schema). More recent research has tended to add more and richer rules, becoming more like construction grammar . The basic type HPSG deals with

44-412: A language's lexicon. Neologisms are often introduced by children who produce erroneous forms by mistake. Other common sources are slang and advertising. There are two types of borrowings (neologisms based on external sources) that retain the sound of the source language material: The following are examples of external lexical expansion using the source language lexical item as the basic material for

66-437: A language's rules. For example, the suffix "-able" is usually only added to transitive verbs , as in "readable" but not "cryable". A compound word is a lexeme composed of several established lexemes, whose semantics is not the sum of that of their constituents. They can be interpreted through analogy , common sense and, most commonly, context . Compound words can have simple or complex morphological structures. Usually, only

88-455: A lexicon, essentially a catalogue of a language's words (its wordstock); and a grammar , a system of rules which allow for the combination of those words into meaningful sentences. The lexicon is also thought to include bound morphemes , which cannot stand alone as words (such as most affixes ). In some analyses, compound words and certain classes of idiomatic expressions, collocations and other phrasemes are also considered to be part of

110-727: A lexicon, they consider such things as what constitutes a word; the word/ concept relationship; lexical access and lexical access failure; how a word's phonology , syntax , and meaning intersect; the morphology -word relationship; vocabulary structure within a given language; language use ( pragmatics ); language acquisition ; the history and evolution of words ( etymology ); and the relationships between words, often studied within philosophy of language . Various models of how lexicons are organized and how words are retrieved have been proposed in psycholinguistics , neurolinguistics and computational linguistics . The University of Tokyo Too Many Requests If you report this error to

132-424: A minimal description. To describe the size of a lexicon, lexemes are grouped into lemmas. A lemma is a group of lexemes generated by inflectional morphology . Lemmas are represented in dictionaries by headwords that list the citation forms and any irregular forms , since these must be learned to use the words correctly. Lexemes derived from a word by derivational morphology are considered new lemmas. The lexicon

154-735: A system analyzing German sentences is provided by the Freie Universität Berlin . In addition the CoreGram project of the Grammar Group of the Freie Universität Berlin provides open source grammars that were implemented in the TRALE system. Currently there are grammars for German , Danish , Mandarin Chinese , Maltese , and Persian that share a common core and are publicly available. Large HPSG grammars of various languages are being developed in

176-485: Is a sign with a verb head, empty subcategorization features, and a phonological value that orders the two children. Although the actual grammar of HPSG is composed entirely of feature structures, linguists often use trees to represent the unification of signs where the equivalent AVM would be unwieldy. Various parsers based on the HPSG formalism have been written and optimizations are currently being investigated. An example of

198-431: Is also organized according to open and closed categories. Closed categories , such as determiners or pronouns , are rarely given new lexemes; their function is primarily syntactic . Open categories, such as nouns and verbs , have highly active generation mechanisms and their lexemes are more semantic in nature. A central role of the lexicon is documenting established lexical norms and conventions . Lexicalization

220-579: Is generally used in the context of a single language. Therefore, multi-lingual speakers are generally thought to have multiple lexicons. Speakers of language variants ( Brazilian Portuguese and European Portuguese , for example) may be considered to possess a single lexicon. Thus a cash dispenser (British English) as well as an automatic teller machine or ATM in American English would be understood by both American and British speakers, despite each group using different dialects. When linguists study

242-443: Is the most common of word formation strategies cross-linguistically. Comparative historical linguistics studies the evolution of languages and takes a diachronic view of the lexicon. The evolution of lexicons in different languages occurs through a parallel mechanism. Over time historical forces work to shape the lexicon, making it simpler to acquire and often creating an illusion of great regularity in language. The term "lexicon"

SECTION 10

#1732859122145

264-564: Is the process by which new words, having gained widespread usage, enter the lexicon. Since lexicalization may modify lexemes phonologically and morphologically, it is possible that a single etymological source may be inserted into a single lexicon in two or more forms. These pairs, called a doublet , are often close semantically. Two examples are aptitude versus attitude and employ versus imply . The mechanisms, not mutually exclusive, are: Neologisms are new lexeme candidates which, if they gain wide usage over time, become part of

286-666: Is the sign. Words and phrases are two different subtypes of sign. A word has two features: [PHON] (the sound, the phonetic form) and [SYNSEM] (the syntactic and semantic information), both of which are split into subfeatures. Signs and rules are formalized as typed feature structures . HPSG generates strings by combining signs, which are defined by their location within a type hierarchy and by their internal feature structure, represented by attribute value matrices (AVMs). Features take types or lists of types as their values, and these values may in turn have their own feature structure. Grammatical rules are largely expressed through

308-479: The Deep Linguistic Processing with HPSG Initiative ( DELPH-IN ). Wide-coverage grammars of English, German, and Japanese are available under an open-source license. These grammars can be used with a variety of inter-compatible open-source HPSG parsers: LKB , PET, Ace, and agree . All of these produce semantic representations in the format of “Minimal Recursion Semantics,” MRS. The declarative nature of

330-468: The HPSG formalism means that these computational grammars can typically be used for both parsing and generation (producing surface strings from semantic inputs). Treebanks, also distributed by DELPH-IN , are used to develop and test the grammars, as well as to train ranking models to decide on plausible interpretations when parsing (or realizations when generating). Enju is a freely available wide-coverage probabilistic HPSG parser for English developed by

352-623: The Tsujii Laboratory at The University of Tokyo in Japan . Lexicon A lexicon (plural: lexicons , rarely lexica ) is the vocabulary of a language or branch of knowledge (such as nautical or medical ). In linguistics , a lexicon is a language's inventory of lexemes . The word lexicon derives from Greek word λεξικόν ( lexikon ), neuter of λεξικός ( lexikos ) meaning 'of or for words'. Linguistic theories generally regard human languages as consisting of two parts:

374-421: The constraints signs place on one another. A sign's feature structure describes its phonological, syntactic, and semantic properties. In common notation, AVMs are written with features in upper case and types in italicized lower case. Numbered indices in an AVM represent token identical values. In the simplified AVM for the word (in this case the verb, not the noun as in "nice walks for the weekend") "walks" below,

396-406: The head requires inflection for agreement. Compounding may result in lexemes of unwieldy proportion. This is compensated by mechanisms that reduce the length of words. A similar phenomenon has been recently shown to feature in social media also where hashtags compound to form longer-sized hashtags that are at times more popular than the individual constituent hashtags forming the compound. Compounding

418-413: The lexicon. Dictionaries are lists of the lexicon, in alphabetical order, of a given language; usually, however, bound morphemes are not included. Items in the lexicon are called lexemes, lexical items, or word forms. Lexemes are not atomic elements but contain both phonological and morphological components. When describing the lexicon, a reductionist approach is used, trying to remain general while using

440-454: The neologization, listed in decreasing order of phonetic resemblance to the original lexical item (in the source language): The following are examples of simultaneous external and internal lexical expansion using target language lexical items as the basic material for the neologization but still resembling the sound of the lexical item in the source language: Another mechanism involves generative devices that combine morphemes according to

462-426: The verb's categorical information (CAT) is divided into features that describe it (HEAD) and features that describe its arguments (VALENCE). "Walks" is a sign of type word with a head of type verb . As an intransitive verb, "walks" has no complement but requires a subject that is a third person singular noun. The semantic value of the subject (CONTENT) is co-indexed with the verb's only argument (the individual doing

SECTION 20

#1732859122145

484-426: The walking). The following AVM for "she" represents a sign with a SYNSEM value that could fulfill those requirements. Signs of type phrase unify with one or more children and propagate information upward. The following AVM encodes the immediate dominance rule for a head-subj-phrase , which requires two children: the head child (a verb) and a non-head child that fulfills the verb's SUBJ constraints. The end result

#144855