FrameNet is a group of online lexical databases based upon the theory of meaning known as Frame semantics , developed by linguist Charles J. Fillmore . The project's fundamental notion is simple: most words' meanings may be best understood in terms of a semantic frame, which is a description of a certain kind of event, connection, or item and its actors.
44-467: As an illustration, the act of cooking usually requires the following: a cook, the food being cooked, a container to hold the food while it is being cooked, and a heating instrument. Within FrameNet, this act is represented by a frame named Apply_heat , and its components ( Cook , Food , Container , and Heating_instrument ), are referred to as frame elements (FEs). The Apply_heat frame also lists
88-524: A Proto-Agent, and proposed that the nominal with the most elements of the Proto-Agent and the fewest elements of the Proto-Patient tends to be treated as the agent in a sentence. This solves problems that most semanticists have with deciding on the number and quality of thematic roles. For example, in the sentence His energy surprised everyone , His energy is the agent, even though it does not have most of
132-465: A Proto-Agent, the co-agent Sylvia is downgraded to patient because it is the direct object of the sentence. Question answering Question answering ( QA ) is a computer science discipline within the fields of information retrieval and natural language processing (NLP) that is concerned with building systems that automatically answer questions that are posed by humans in a natural language . A question-answering implementation, usually
176-430: A computer program, may construct its answers by querying a structured database of knowledge or information, usually a knowledge base . More commonly, question-answering systems can pull answers from an unstructured collection of natural language documents. Some examples of natural language document collections used for question answering systems include: Question-answering research attempts to develop ways of answering
220-561: A computer program. In the 1970s, knowledge bases were developed that targeted narrower domains of knowledge. The question answering systems developed to interface with these expert systems produced more repeatable and valid responses to questions within an area of knowledge. These expert systems closely resembled modern question answering systems except in their internal architecture. Expert systems rely heavily on expert-constructed and organized knowledge bases , whereas many modern question answering systems rely on statistical processing of
264-462: A large, unstructured, natural language text corpus. The 1970s and 1980s saw the development of comprehensive theories in computational linguistics , which led to the development of ambitious projects in text comprehension and question answering. One example was the Unix Consultant (UC), developed by Robert Wilensky at U.C. Berkeley in the late 1980s. The system answered questions pertaining to
308-460: A mathematical formula retrieved from Wikidata as a succinct answer, translated into a computable form that allows the user to insert values for the variables. The system retrieves names and values of variables and common constants from Wikidata if those are available. It is claimed that the system outperforms a commercial computational mathematical knowledge engine on a test set. MathQA is hosted by Wikimedia at https://mathqa.wmflabs.org/ . In 2022, it
352-632: A number of computational applications, because computers need additional knowledge in order to recognize that "John sold a car to Mary" and "Mary bought a car from John" describe essentially the same situation, despite using two quite different verbs, different prepositions and a different word order. FrameNet has been used in applications like question answering , paraphrasing , recognizing textual entailment , and information extraction , either directly or by means of Semantic Role Labeling tools. The first automatic system for Semantic Role Labeling (SRL, sometimes also referred to as "shallow semantic parsing")
396-400: A number of words that represent it, known as lexical units (LUs), like fry , bake , boil , and broil . Other frames are simpler. For example, Placing only has an agent or cause, a theme —something that is placed—and the location where it is placed. Some frames are more complex, like Revenge , which contains more FEs (offender, injury, injured party, avenger, and punishment). As in
440-460: A permanent trait of agency ( agent noun : runner, kicker, etc.), an agent noun is not necessarily an agent of a sentence: "Jack kicked the runner". For many people, the notion of agency is easy to grasp intuitively but difficult to define: typical qualities that a grammatical agent often has are that it has volition , is sentient or perceives, causes a change of state, or moves. The linguist David Dowty included these qualities in his definition of
484-411: A query in its logical form . Accepting natural language questions makes the system more user-friendly, but harder to implement, as there are a variety of question types and the system will have to identify the correct one in order to give a sensible answer. Assigning a question type to the question is a crucial task; the entire answer extraction process relies on finding the correct question type and hence
SECTION 10
#1733094105679528-473: A retriever-reader architecture. The retriever is aimed at retrieving relevant documents related to a given question, while the reader is used to infer the answer from the retrieved documents. Systems such as GPT-3 , T5, and BART use an end-to-end architecture in which a transformer-based architecture stores large-scale textual data in the underlying parameters. Such models can answer questions without accessing any external knowledge sources. Question answering
572-574: A set of formula variants. Subsequently, the variables are substituted with random values to generate a large number of different questions suitable for individual student tests. PhysWikiquiz is hosted by Wikimedia at https://physwikiquiz.wmflabs.org/ . Question answering systems have been extended in recent years to encompass additional domains of knowledge For example, systems have been developed to automatically answer temporal and geospatial questions, questions of definition and terminology, biographical questions, multilingual questions, and questions about
616-567: A textual description of what it represents (a frame definition), associated frame elements, lexical units, example sentences, and frame-to-frame relations. Frame elements (FE) provide additional information to the semantic structure of a sentence. Each frame has a number of core and non-core FEs which can be thought of as semantic roles. Core FEs are essential to the meaning of the frame while non-core FEs are generally descriptive (such as time, place, manner, etc.) For example: FrameNet includes shallow data on syntactic roles that frame elements play in
660-439: A wide range of question types, including fact, list, definition , how, why, hypothetical, semantically constrained, and cross-lingual questions. Another way to categorize question-answering systems is by the technical approach used. There are a number of different types of QA systems, including Rule-based systems use a set of rules to determine the correct answer to a question. Statistical systems use statistical methods to find
704-430: Is a challenging problem because semantic relatedness is not trivial. The lab was motivated by the fact that 20% of mathematical queries in general-purpose search engines are expressed as well-formed questions. The challenge contained two separate sub-tasks. Task 1: "Answer retrieval" matching old post answers to newly posed questions, and Task 2: "Formula retrieval" matching old post formulae to new questions. Starting with
748-407: Is based explicitly on its relationship to the action or event expressed by the verb (e.g. "He who kicked the ball"), whereas the subject is based on a more formal title using the theory of the information flow (e.g. "Jack kicked the ball"). In the sentence "The boy kicked the ball", the boy is the agent and the subject. However, when the sentence is rendered in the passive voice , "The ball
792-658: Is dependent on a good search corpus ; without documents containing the answer, there is little any question answering system can do. Larger collections generally mean better question answering performance, unless the question domain is orthogonal to the collection. Data redundancy in massive collections, such as the web, means that nuggets of information are likely to be phrased in many different ways in differing contexts and documents, leading to two benefits: Some question answering systems rely heavily on automated reasoning . In information retrieval , an open-domain question answering system tries to return an answer in response to
836-506: The Unix operating system. It had a comprehensive, hand-crafted knowledge base of its domain, and it aimed at phrasing the answer to accommodate various types of users. Another project was LILOG, a text-understanding system that operated on the domain of tourism information in a German city. The systems developed in the UC and LILOG projects never went past the stage of simple demonstrations, but they helped
880-407: The valence of each frame; that is, the number and position of the frame elements within example sentences. The sentence falls in the valence pattern which occurs twice in the FrameNet's annotation report for the born.v lexical unit, namely: FrameNet additionally captures relationships between different frames using relations. These include the following: FrameNet has proven to be useful in
924-476: The answer type. In the example above, the subject is "Chinese National Day", the predicate is "is" and the adverbial modifier is "when", therefore the answer type is "Date". Unfortunately, some interrogative words like "Which", "What", or "How" do not correspond to unambiguous answer types: Each can represent more than one type. In situations like this, other words in the question need to be considered. A lexical dictionary such as WordNet can be used for understanding
SECTION 20
#1733094105679968-419: The candidate—the more and the closer the better. The answer is then translated by parsing into a compact and meaningful representation. In the previous example, the expected output answer is "1st Oct." An open-source, math-aware, question answering system called MathQA , based on Ask Platypus and Wikidata , was published in 2018. MathQA takes an English or Hindi natural language question as input and returns
1012-504: The case with LUs that have multiple word senses. Alongside the frame, each lexical unit is associated with specific frame elements by means of the annotated example sentences. For example, lexical units that evoke the Complaining frame (or more specific perspectivized versions of it, to be precise), include the verbs complain , grouse , lament , and others. Frames are associated with example sentences and frame elements are marked within
1056-577: The content of audio, images, and video. Current question answering research topics include: In 2011, Watson , a question answering computer system developed by IBM , competed in two exhibition matches of Jeopardy! against Brad Rutter and Ken Jennings , winning by a significant margin. Facebook Research made their DrQA system available under an open source license . This system uses Misplaced Pages as knowledge source. The open source framework Haystack by deepset combines open-domain question answering with generative question answering and supports
1100-413: The context. Once the system identifies the question type, it uses an information retrieval system to find a set of documents that contain the correct keywords. A tagger and NP/Verb Group chunker can verify whether the correct entities and relations are mentioned in the found documents. For questions such as "Who" or "Where", a named-entity recogniser finds relevant "Person" and "Location" names from
1144-422: The correct answer type. Keyword extraction is the first step in identifying the input question type. In some cases, words clearly indicate the question type, e.g., "Who", "Where", "When", or "How many"—these words might suggest to the system that the answers should be of type "Person", "Location", "Date", or "Number", respectively. POS (part-of-speech) tagging and syntactic parsing techniques can also determine
1188-561: The development of theories on computational linguistics and reasoning. Specialized natural-language question answering systems have been developed, such as EAGLi for health and life scientists. QA systems are used in a variety of applications, including As of 2001 , question-answering systems typically included a question classifier module that determined the type of question and the type of answer. Different types of question-answering systems employ different architectures. For example, modern open-domain question answering systems may use
1232-613: The domain of mathematics, which involves formula language, the goal is to later extend the task to other domains (e.g., STEM disciplines, such as chemistry, biology, etc.), which employ other types of special notation (e.g., chemical formulae). The inverse of mathematical question answering—mathematical question generation—has also been researched. The PhysWikiQuiz physics question generation and test engine retrieves mathematical formulae from Wikidata together with semantic information about their constituting identifiers (names and values of variables). The formulae are then rearranged to generate
1276-560: The example sentences. For example, for a sentence like "She was born about AD 460", FrameNet would mark She as a noun phrase referring to the Child frame element, and "about AD 460" as a noun phrase corresponding to the Time frame element. Details of how frame elements can be realized in a sentence are important because this reveals important information about the subcategorization frames as well as possible diathesis alternations (e.g. "John broke
1320-468: The examples of Apply_heat and Revenge below, FrameNet's role is to define the frames and annotate sentences to demonstrate how the FEs fit syntactically around the word that elicits the frame. A frame is a schematic representation of a situation involving various participants, props, and other conceptual roles. Examples of frame names are Being_born and Locative_relation . A frame in FrameNet contains
1364-461: The first chatterbot programs. SHRDLU was a successful question-answering program developed by Terry Winograd in the late 1960s and early 1970s. It simulated the operation of a robot in a toy world (the "blocks world"), and it offered the possibility of asking the robot questions about the state of the world. The strength of this system was the choice of a very specific domain and a very simple world with rules of physics that were easy to encode in
FrameNet - Misplaced Pages Continue
1408-498: The most likely answer to a question. Hybrid systems use a combination of rule-based and statistical methods. Two early question answering systems were BASEBALL and LUNAR. BASEBALL answered questions about Major League Baseball over a period of one year . LUNAR answered questions about the geological analysis of rocks returned by the Apollo Moon missions. Both question answering systems were very effective in their chosen domains. LUNAR
1452-434: The retrieved documents. Only the relevant paragraphs are selected for ranking. A vector space model can classify the candidate answers. Check if the answer is of the correct type as determined in the question type analysis stage. An inference technique can validate the candidate answers. A score is then given to each of these candidates according to the number of question words it contains and how close these words are to
1496-594: The sentences. Thus, the sentence is associated with the frame Being_born , while She is marked as the frame element Child and "about AD 460" is marked as Time . From the start, the FrameNet project has been committed to looking at evidence from actual language use as found in text collections like the British National Corpus . Based on such example sentences, automatic semantic role labeling tools are able to determine frames and mark frame elements in new sentences. FrameNet also exposes statistics on
1540-440: The situation is denoted by a sentence , the action by a verb in the sentence, and the agent by a noun phrase . For example, in the sentence "Jack kicked the ball", Jack is the agent and the ball is the patient . In certain languages, the agent is declined or otherwise marked to indicate its grammatical role. Modern English does not mark the agentive grammatical role of a noun in a sentence. Although certain nouns do have
1584-481: The subject is determined syntactically, primarily through word order, the agent is determined through its relationship to the action expressed by the verb . For example, in the sentence "The little girl was bitten by the dog", girl is the subject, but dog is the agent. The word agent comes from the present participle agens , agentis ('the one doing') of the Latin verb agere , to 'do' or 'make'. Typically,
1628-407: The typical agent-like qualities such as perception, movement, or volition. Even Dowty's solution fails for verbs expressing relationships in time: (1) April precedes May. vs: (2) May follows April. Here what is agent and what is patient must be specified for each individual verb. The grammatical agent is often confused with the subject , but the two notions are quite distinct: the agent
1672-452: The user's question. The returned answer is in the form of short texts rather than a list of relevant documents. The system finds answers by using a combination of techniques from computational linguistics , information retrieval , and knowledge representation . The system takes a natural language question as an input rather than a set of keywords, for example: "When is the national day of China?" It then transforms this input sentence into
1716-399: The window" vs. "The window broke") of a verb. Lexical units (LUs) are lemmas, with their part of speech, that evoke a specific frame. In other words, when an LU is identified in a sentence, that specific LU can be associated with its specific frame(s). For each frame, there may be many LUs associated to that frame, and also there may be many frames that share a specific LU; this is typically
1760-405: The years that have relied on the original FrameNet as the basis for additional non-English FrameNets, for Spanish, Japanese, German, and Polish, among others. Agent (grammar) In linguistics , a grammatical agent is the thematic relation of the cause or initiator to an event. The agent is a semantic concept distinct from the subject of a sentence as well as from the topic . While
1804-500: Was demonstrated at a lunar science convention in 1971 and it was able to answer 90% of the questions in its domain that were posed by people untrained on the system. Further restricted-domain question answering systems were developed in the following years. The common feature of all these systems is that they had a core database or knowledge system that was hand-written by experts of the chosen domain. The language abilities of BASEBALL and LUNAR used techniques similar to ELIZA and DOCTOR ,
FrameNet - Misplaced Pages Continue
1848-467: Was developed by Daniel Gildea and Daniel Jurafsky based on FrameNet in 2002. Semantic Role Labeling has since become one of the standard tasks in natural language processing, with the latest version (1.7) of FrameNet now fully supported in the Natural Language Toolkit . Since frames are essentially semantic descriptions, they are similar across languages, and several projects have arisen over
1892-509: Was extended to answer 15 math question types. MathQA methods need to combine natural and formula language. One possible approach is to perform supervised annotation via Entity Linking . The "ARQMath Task" at CLEF 2020 was launched to address the problem of linking newly posted questions from the platform Math Stack Exchange to existing ones that were already answered by the community. Providing hyperlinks to already answered, semantically related questions helps users to get answers earlier but
1936-476: Was kicked by the boy", the ball is the grammatical subject, but the boy is still the agent. Many sentences in English and other Indo-European languages have the agent as subject. The use of some transitive verbs denoting strictly reciprocal events may involve a conflation of agent and subject. In the sentence "John met Sylvia", for example, though both John and Sylvia would equally meet Dowty's definition of
#678321