Manually Annotated Sub-Corpus (MASC) is a balanced subset of 500K words of written texts and transcribed speech drawn primarily from the Open American National Corpus (OANC). The OANC is a 15 million word (and growing) corpus of American English produced since 1990, all of which is in the public domain or otherwise free of usage and redistribution restrictions.
39-407: All of MASC includes manually validated annotations for logical structure (headings, sections, paragraphs, etc.), sentence boundaries, three different tokenizations with associated part of speech tags, shallow parse (noun and verb chunks), named entities (person, location, organization, date and time), and Penn Treebank syntax. Additional manually produced or validated annotations have been produced by
78-407: A determiner in many contexts, and thus a distinction is made in syntactic analysis between phrases that have received their required determiner (such as the big house ), and those in which the determiner is lacking (such as big house ). The situation is complicated by the fact that in some contexts a noun phrase may nonetheless be used without a determiner (as in I like big houses ); in this case
117-447: A theme —something that is placed—and the location where it is placed. Some frames are more complex, like Revenge , which contains more FEs (offender, injury, injured party, avenger, and punishment). As in the examples of Apply_heat and Revenge below, FrameNet's role is to define the frames and annotate sentences to demonstrate how the FEs fit syntactically around the word that elicits
156-508: A desire for theory-internal consistency. A phrase is deemed to be a word or a combination of words that appears in a set syntactic position, for instance in subject position or object position. On this understanding of phrases, the nouns and pronouns in bold in the following sentences are noun phrases (as well as nouns or pronouns): The words in bold are called phrases since they appear in the syntactic positions where multiple-word phrases (i.e. traditional phrases) can appear. This practice takes
195-453: A major limitation on the amount of structure that the theory can assume, produce simple, relatively flat structures for noun phrases. The representation also depends on whether the noun or the determiner is taken to be the head of the phrase (see the discussion of the DP hypothesis in the previous section). Below are some possible trees for the two noun phrases the big house and big houses (as in
234-632: A number of computational applications, because computers need additional knowledge in order to recognize that "John sold a car to Mary" and "Mary bought a car from John" describe essentially the same situation, despite using two quite different verbs, different prepositions and a different word order. FrameNet has been used in applications like question answering , paraphrasing , recognizing textual entailment , and information extraction , either directly or by means of Semantic Role Labeling tools. The first automatic system for Semantic Role Labeling (SRL, sometimes also referred to as "shallow semantic parsing")
273-463: A phrase that can stand in for X. By 1912, the concept of a noun phrase as being based around a noun can be found, for example, "an adverbial noun phrases is a group of words of which the noun is the base word, that tells the time or place of an action, or how long, how far, or how much". By 1924, the idea of a noun phrase being a noun plus dependents seems to be established. For example, "Note order of words in noun-phrase--noun + adj. + genitive" suggests
312-782: A wide variety of linguistic annotations, MASC contains a balanced selection of texts from a broad range of genres: At present, MASC includes seventeen different types of linguistic annotation (* = in production; ** currently available in original format only): All MASC annotations, whether contributed or produced in-house, are transduced to the Graph Annotation Format (GrAF) defined by ISO TC37 SC4's Linguistic Annotation Framework (LAF). The online tool ANC2Go can transduce annotations over all or parts of MASC to any of several other formats, including CONLL IOB format and formats for use in UIMA and General Architecture for Text Engineering . MASC
351-409: Is a group of online lexical databases based upon the theory of meaning known as Frame semantics , developed by linguist Charles J. Fillmore . The project's fundamental notion is simple: most words' meanings may be best understood in terms of a semantic frame, which is a description of a certain kind of event, connection, or item and its actors. As an illustration, the act of cooking usually requires
390-561: Is a noun phrase. As to whether the string must contain at least two words, see the following section. Traditionally, a phrase is understood to contain two or more words . The traditional progression in the size of syntactic units is word < phrase < clause , and in this approach a single word (such as a noun or pronoun) would not be referred to as a phrase. However, many modern schools of syntax – especially those that have been influenced by X-bar theory – make no such restriction. Here many single words are judged to be phrases based on
429-489: Is a noun phrase. In the sentence I like big houses , both houses and big houses are N-bars, but big houses also functions as a noun phrase (in this case without an explicit determiner). In some modern theories of syntax, however, what are called "noun phrases" above are no longer considered to be headed by a noun, but by the determiner (which may be null), and they are thus called determiner phrases (DP) instead of noun phrases. (In some accounts that take this approach,
SECTION 10
#1733084582655468-611: Is an open data resource that can be used by anyone for any purpose. At the same time, it is a collaborative community resource that is sustained by community contributions of annotations and derived data. It is freely downloadable from the MASC download page or through the Linguistic Data Consortium . MASC is also distributed in part-of-speech-tagged form with the Natural Language Toolkit . FrameNet FrameNet
507-547: The is now depicted as the head of the entire phrase, thus making the phrase a determiner phrase. There is still a noun phrase present ( old picture of Fred that I found in the drawer ) but this phrase is below the determiner. An early conception of the noun phrase can be found in First work in English by Alexander Murison . In this conception a noun phrase is "the infinitive of the verb" (p. 146), which may appear "in any position in
546-516: The syntactic functions that they fulfill are those of the arguments of the main clause predicate , particularly those of subject , object and predicative expression . They also function as arguments in such constructs as participial phrases and prepositional phrases . For example: Sometimes a noun phrase can also function as an adjunct of the main clause predicate, thus taking on an adverbial function, e.g. In some languages, including English, noun phrases are required to be "completed" with
585-459: The valence of each frame; that is, the number and position of the frame elements within example sentences. The sentence falls in the valence pattern which occurs twice in the FrameNet's annotation report for the born.v lexical unit, namely: FrameNet additionally captures relationships between different frames using relations. These include the following: FrameNet has proven to be useful in
624-422: The DP hypothesis is rejected or accepted, see the next section. The representation of noun phrases using parse trees depends on the basic approach to syntactic structure adopted. The layered trees of many phrase structure grammars grant noun phrases an intricate structure that acknowledges a hierarchy of functional projections. Dependency grammars , in contrast, since the basic architecture of dependency places
663-543: The MASC project for portions of the sub-corpus, including full-text annotation for FrameNet frame elements and a 100K+ sentence corpus with WordNet 3.1 sense tags, of which one-tenth are also annotated for FrameNet frame elements. Annotations of all or portions of the sub-corpus for a wide variety of other linguistic phenomena have been contributed by other projects, including PropBank , TimeBank , MPQA opinion , and several others. Co-reference annotations and clause boundaries of
702-506: The case with LUs that have multiple word senses. Alongside the frame, each lexical unit is associated with specific frame elements by means of the annotated example sentences. For example, lexical units that evoke the Complaining frame (or more specific perspectivized versions of it, to be precise), include the verbs complain , grouse , lament , and others. Frames are associated with example sentences and frame elements are marked within
741-537: The constellation to be primitive rather than the words themselves. The word he , for instance, functions as a pronoun, but within the sentence it also functions as a noun phrase. The phrase structure grammars of the Chomskyan tradition ( government and binding theory and the minimalist program ) are primary examples of theories that apply this understanding of phrases. Other grammars such as dependency grammars are likely to reject this approach to phrases, since they take
780-466: The constituent lacking the determiner – that called N-bar above – may be referred to as a noun phrase.) This analysis of noun phrases is widely referred to as the DP hypothesis . It has been the preferred analysis of noun phrases in the minimalist program from its start (since the early 1990s), though the arguments in its favor tend to be theory-internal. By taking the determiner, a function word, to be head over
819-438: The determiner as the head of the phrase, see for instance Chomsky (1995) and Hudson (1990) . Some examples of noun phrases are underlined in the sentences below. The head noun appears in bold. Noun phrases can be identified by the possibility of pronoun substitution, as is illustrated in the examples below. A string of words that can be replaced by a single pronoun without rendering the sentence grammatically unacceptable
SECTION 20
#1733084582655858-488: The entire MASC corpus are scheduled to be released by the end of 2016. WordNet sense annotations for all occurrences of 114 words are also included in the MASC distribution, as well as FrameNet annotations for 50-100 occurrences of each of the 114 words. The sentences with WordNet and FrameNet annotations are also distributed as a part of the MASC Sentence Corpus . Unlike most freely available corpora including
897-552: The following: a cook, the food being cooked, a container to hold the food while it is being cooked, and a heating instrument. Within FrameNet, this act is represented by a frame named Apply_heat , and its components ( Cook , Food , Container , and Heating_instrument ), are referred to as frame elements (FEs). The Apply_heat frame also lists a number of words that represent it, known as lexical units (LUs), like fry , bake , boil , and broil . Other frames are simpler. For example, Placing only has an agent or cause,
936-445: The frame. A frame is a schematic representation of a situation involving various participants, props, and other conceptual roles. Examples of frame names are Being_born and Locative_relation . A frame in FrameNet contains a textual description of what it represents (a frame definition), associated frame elements, lexical units, example sentences, and frame-to-frame relations. Frame elements (FE) provide additional information to
975-490: The head noun. Other languages, such as French , often place even single-word adjectives after the noun. Noun phrases can take different forms than that described above, for example when the head is a pronoun rather than a noun, or when elements are linked with a coordinating conjunction such as and , or , but . For more information about the structure of noun phrases in English, see English grammar § Phrases . Noun phrases typically bear argument functions. That is,
1014-441: The language in question. In English, determiners, adjectives (and some adjective phrases) and noun modifiers precede the head noun, whereas the heavier units – phrases and clauses – generally follow it. This is part of a strong tendency in English to place heavier constituents to the right, making English more of a head-initial language. Head-final languages (e.g. Japanese and Turkish ) are more likely to place all modifiers before
1053-400: The most frequently occurring phrase type. Noun phrases often function as verb subjects and objects , as predicative expressions , and as complements of prepositions . One NP can be embedded inside another NP; for instance, some of his constituents has as a constituent the shorter NP his constituents . In some theories of grammar, noun phrases with determiners are analyzed as having
1092-497: The noun, a structure is established that is analogous to the structure of the finite clause , with a complementizer . Apart from the minimalist program, however, the DP hypothesis is rejected by most other modern theories of syntax and grammar, in part because these theories lack the relevant functional categories. Dependency grammars, for instance, almost all assume the traditional NP analysis of noun phrases. For illustrations of different analyses of noun phrases depending on whether
1131-411: The phrase may be described as having a "null determiner". (Situations in which this is possible depend on the rules of the language in question; for English, see English articles .) In the original X-bar theory , the two respective types of entity are called noun phrase (NP) and N-bar ( N , N ′ ). Thus in the sentence Here is the big house , both house and big house are N-bars, while the big house
1170-416: The semantic structure of a sentence. Each frame has a number of core and non-core FEs which can be thought of as semantic roles. Core FEs are essential to the meaning of the frame while non-core FEs are generally descriptive (such as time, place, manner, etc.) For example: FrameNet includes shallow data on syntactic roles that frame elements play in the example sentences. For example, for a sentence like "She
1209-401: The sentence where a noun may appear". For example, to be just is more important than to be generous has two underlined infinitives which may be replaced by nouns, as in justice is more important than generosity . This same conception can be found in subsequent grammars, such as 1878's A Tamil Grammar or 1882's Murby's English grammar and analysis , where the conception of an X phrase is
Manually Annotated Sub-Corpus - Misplaced Pages Continue
1248-403: The sentences Here is the big house and I like big houses ). 1. Phrase-structure trees, first using the original X-bar theory, then using the current DP approach: 2. Dependency trees, first using the traditional NP approach, then using the DP approach: The following trees represent a more complex phrase. For simplicity, only dependency-based trees are given. The first tree is based on
1287-595: The sentences. Thus, the sentence is associated with the frame Being_born , while She is marked as the frame element Child and "about AD 460" is marked as Time . From the start, the FrameNet project has been committed to looking at evidence from actual language use as found in text collections like the British National Corpus . Based on such example sentences, automatic semantic role labeling tools are able to determine frames and mark frame elements in new sentences. FrameNet also exposes statistics on
1326-480: The traditional assumption that nouns, rather than determiners, are the heads of phrases. The head noun picture has the four dependents the , old , of Fred , and that I found in the drawer . The tree shows how the lighter dependents appear as pre-dependents (preceding their head) and the heavier ones as post-dependents (following their head). The second tree assumes the DP hypothesis, namely that determiners serve as phrase heads, rather than nouns. The determiner
1365-399: The window" vs. "The window broke") of a verb. Lexical units (LUs) are lemmas, with their part of speech, that evoke a specific frame. In other words, when an LU is identified in a sentence, that specific LU can be associated with its specific frame(s). For each frame, there may be many LUs associated to that frame, and also there may be many frames that share a specific LU; this is typically
1404-404: The words themselves to be primitive. For them, phrases must contain two or more words. A typical noun phrase consists of a noun (the head of the phrase) together with zero or more dependents of various types. (These dependents, since they modify a noun, are called adnominal .) The chief types of these dependents are: The allowability, form and position of these elements depend on the syntax of
1443-422: The years that have relied on the original FrameNet as the basis for additional non-English FrameNets, for Spanish, Japanese, German, and Polish, among others. Noun phrase A noun phrase – or NP or nominal (phrase) – is a phrase that usually has a noun or pronoun as its head , and has the same grammatical functions as a noun. Noun phrases are very common cross-linguistically , and they may be
1482-499: Was born about AD 460", FrameNet would mark She as a noun phrase referring to the Child frame element, and "about AD 460" as a noun phrase corresponding to the Time frame element. Details of how frame elements can be realized in a sentence are important because this reveals important information about the subcategorization frames as well as possible diathesis alternations (e.g. "John broke
1521-403: Was developed by Daniel Gildea and Daniel Jurafsky based on FrameNet in 2002. Semantic Role Labeling has since become one of the standard tasks in natural language processing, with the latest version (1.7) of FrameNet now fully supported in the Natural Language Toolkit . Since frames are essentially semantic descriptions, they are similar across languages, and several projects have arisen over
#654345