Winograd schema challenge - Misplaced Pages

The Winograd schema challenge ( WSC ) is a test of machine intelligence proposed in 2012 by Hector Levesque , a computer scientist at the University of Toronto . Designed to be an improvement on the Turing test , it is a multiple-choice test that employs questions of a very specific structure: they are instances of what are called Winograd schemas, named after Terry Winograd , professor of computer science at Stanford University .

#110889

46-509: On the surface, Winograd schema questions simply require the resolution of anaphora : the machine must identify the antecedent of an ambiguous pronoun in a statement. This makes it a task of natural language processing , but Levesque argues that for Winograd schemas, the task requires the use of knowledge and commonsense reasoning . The challenge is considered defeated in 2019 since a number of transformer -based language models achieved accuracies of over 90%. The Winograd Schema Challenge

92-570: A current or future or past office-holder, the office in a strict legal sense, or the office in a general sense which includes activities a mayor might conduct, might even be expected to conduct, while they may not be explicitly defined for this office. The term anaphor is used in a special way in the generative grammar tradition. Here it denotes what would normally be called a reflexive or reciprocal pronoun, such as himself or each other in English, and analogous forms in other languages. The use of

138-408: A dialog or text, or pointing to the right in languages that are written from left to right: Ancient Greek καταφορά (kataphorá, "a downward motion"), from κατά (katá, "downwards") + φέρω (phérō, "I carry"). A pro-form is a cataphor when it points to its right toward its postcedent. Both effects together are called either anaphora (broad sense) or less ambiguously, along with self-reference they comprise

184-406: A dialog or text, such as referring to the left when an anaphor points to its left toward its antecedent in languages that are written from left to right. Etymologically, anaphora derives from Ancient Greek ἀναφορά (anaphorá, "a carrying back"), from ἀνά (aná, "up") + φέρω (phérō, "I carry"). In this narrow sense, anaphora stands in contrast to cataphora , which sees the act of referring forward in

230-481: A dialog or text. Anaphora is an important concept for different reasons and on different levels: first, anaphora indicates how discourse is constructed and maintained; second, anaphora binds different syntactical elements together at the level of the sentence; third, anaphora presents a challenge to natural language processing in computational linguistics , since the identification of the reference can be difficult; and fourth, anaphora partially reveals how language

276-408: A fragment of what someone says using the pronoun her , you might never discover who she is, though if you heard the rest of what the speaker was saying on the same occasion, you might discover who she is, either by anaphoric revelation or by exophoric implication because you realize who she must be according to what else is said about her even if her identity is not explicitly mentioned, as in

322-405: A so-called Winograd schema that is "too easy": The women stopped taking pills because they were [pregnant/carcinogenic]. Which individuals were [pregnant/carcinogenic]? The answer to this question can be determined on the basis of selectional restrictions : in any situation, pills do not get pregnant, women do; women cannot be carcinogenic, but pills can. Thus this answer could be derived without

368-507: A text-only channel (such as teletype). In general, the machine passes the test if interrogators are not able to tell the difference between it and a human in a five-minute conversation. Nuance Communications announced in July 2014 that it would sponsor an annual WSC competition, with a prize of $ 25,000 for the best system that could match human performance. However, the prize is no longer offered. The performance of Eugene Goostman exhibited some of

414-477: Is Elizabeth does not know, even if an English-UK Queen Elizabeth becomes indicated, if this queen means Queen Elizabeth I or Queen Elizabeth II and must await further clues in additional communications. Similarly, in discussing 'The Mayor' (of a city), the Mayor's identity must be understood broadly through the context which the speech references as general 'object' of understanding; is a particular human person meant,

460-411: Is a measure of how much a particular word in a particular role (like subject or direct object) matches the selectional preference of a particular predicate. For example, the word cake has a high thematic fit as a direct object for cut . The concepts of c-selection and subcategorization overlap in meaning and use to a significant degree. If there is a difference between these concepts, it resides with

506-472: Is a semantic concept, whereas subcategorization is a syntactic one. Selection is closely related to valency , a term used in other grammars than the Chomskian generative grammar, for a similar phenomenon. The following pairs of sentences will illustrate the concept of selection: The # indicates semantic deviance. The predicate is wilting selects a subject argument that is a plant or is plant-like. Similarly,

SECTION 10

#1732876934111

552-399: Is concerned, eats subcategorizes for its object argument beans only. This difference between c-selection and subcategorization depends crucially on the understanding of subcategorization. An approach to subcategorization that sees predicates as subcategorizing for their subject arguments as well as for their object arguments will draw no distinction between c-selection and subcategorization;

598-506: Is not yet present in the discourse, since the pronoun's referent has not been formerly introduced, including the case of 'everything but' what has been introduced. The set of ice-cream-eating-children in example b is introduced into the discourse, but then the pronoun they refers to the set of non-ice-cream-eating-children, a set which has not been explicitly mentioned. Both semantic and pragmatics considerations attend this phenomenon, which following discourse representation theory since

644-649: Is of interest in shedding light on brain access to information , calculation , mental modeling , communication . There are many theories that attempt to prove how anaphors are related and trace back to their antecedents, with centering theory (Grosz, Joshi, and Weinstein 1983) being one of them. Taking the computational theory of mind view of language, centering theory gives a computational analysis of underlying antecedents. In their original theory, Grosz, Joshi, & Weinstein (1983) propose that some discourse entities in utterances are more "central" than others, and this degree of centrality imposes constraints on what can be

690-424: Is rather present in the situational context. Deictic pro-forms are stereotypical exophors, e.g. Exophors cannot be anaphors as they do not substantially refer within the dialog or text, though there is a question of what portions of a conversation or document are accessed by a listener or reader with regard to whether all references to which a term points within that language stream are noticed (i.e., if you hear only

736-414: Is the use of an expression whose interpretation depends upon another expression in context (its antecedent ). In a narrower sense, anaphora is the use of an expression that depends specifically upon an antecedent expression and thus is contrasted with cataphora , which is the use of an expression that depends upon a postcedent expression. The anaphoric (referring) term is called an anaphor . For example, in

782-412: Is understood and processed, which is relevant to fields of linguistics interested in cognitive psychology . The term anaphora is actually used in two ways. In a broad sense, it denotes the act of referring. Any time a given expression (e.g. a pro-form) refers to another contextual entity, anaphora is present. In a second, narrower sense, the term anaphora denotes the act of referring backwards in

828-456: The c- or s- , s-selection is usually understood. The b-sentences above do not contain violations of the c-selectional restrictions of the predicates is wilting and drank ; they are, rather, well-formed from a syntactic point of view (hence #, not *), for the arguments the building and a car satisfy the c-selectional restrictions of their respective predicates, these restrictions requiring their arguments to be nouns or noun phrases. Just

874-415: The syntactic category of their complement arguments - e.g. noun (phrase), verb (phrase), adjective (phrase), etc. - i.e. they determine the syntactic category of their complements. In contrast, predicates s-select the semantic content of their arguments. Thus s-selection is a semantic concept, whereas c-selection is a syntactic one. When the term selection or selectional restrictions appears alone without

920-534: The Turing test's problems. Levesque identifies several major issues, summarized as follows: The key factor in the WSC is the special format of its questions, which are derived from Winograd schemas. Questions of this form may be tailored to require knowledge and commonsense reasoning in a variety of domains. They must also be carefully written not to betray their answers by selectional restrictions or statistical information about

966-559: The answer to this schema has to do with our understanding of the typical relationships between and behavior of councilmen and demonstrators. Since the original proposal of the Winograd schema challenge, Ernest Davis, a professor at New York University , has compiled a list of over 140 Winograd schemas from various sources as examples of the kinds of questions that should appear on the Winograd schema challenge. A Winograd schema challenge question consists of three parts: A machine will be given

SECTION 20

#1732876934111

1012-399: The antecedent. In the theory, there are different types of centers: forward facing, backwards facing, and preferred. A ranked list of discourse entities in an utterance. The ranking is debated, some focusing on theta relations (Yıldırım et al. 2004) and some providing definitive lists. The highest ranked discourse entity in the previous utterance. The highest ranked discourse entity in

1058-477: The case of homophoric reference ). A listener might, for example, realize through listening to other clauses and sentences that she is a Queen because of some of her attributes or actions mentioned. But which queen? Homophoric reference occurs when a generic phrase obtains a specific meaning through knowledge of its context. For example, the referent of the phrase the Queen (using an emphatic definite article , not

1104-413: The category of endophora. Examples of anaphora (in the narrow sense) and cataphora are given next. Anaphors and cataphors appear in bold, and their antecedents and postcedents are underlined: A further distinction is drawn between endophoric and exophoric reference . Exophoric reference occurs when an expression, an exophor, refers to something that is not directly present in the linguistic context, but

1150-473: The challenge did not proceed to the second round. The organizing committee in 2016 was Leora Morgenstern, Ernest Davis, and Charles Ortiz. In 2017, a neural association model designed for commonsense knowledge acquisition achieved 70% accuracy on 70 manually selected problems from the original 273 Winograd schema dataset. In June 2018, a score of 63.7% accuracy was achieved on the full dataset using an ensemble of recurrent neural network language models, marking

1196-402: The demonstrators a permit because they advocated violence. The schema challenge question is, "Does the pronoun 'they' refer to the city councilmen or the demonstrators?" Switching between the two instances of the schema changes the answer. The answer is immediate for a human reader, but proves difficult to emulate in machines. Levesque argues that knowledge plays a central role in these problems:

1242-431: The early 1980s, such as work by Kamp (1981) and Heim (File Change Semantics, 1982), and generalized quantifier theory , such as work by Barwise and Cooper (1981), was studied in a series of psycholinguistic experiments in the early 1990s by Moxey and Sanford (1993) and Sanford et al. (1994). In complement anaphora as in the case of the pronoun in example b, this anaphora refers to some sort of complement set (i.e. only to

1288-539: The first use of deep neural networks that learn from independent corpora to acquire common sense knowledge. In 2019 a score of 90.1%, was achieved on the original Winograd schema dataset by fine-tuning of the BERT language model with appropriate WSC-like training data to avoid having to learn commonsense reasoning. The general language model GPT-3 achieved a score of 88.3% without specific fine-tuning in 2020. A more challenging, adversarial "Winogrande" dataset of 44,000 problems

1334-418: The following example a, the anaphoric pronoun they refers to the children who are eating the ice-cream. Contrastingly, example b has they seeming to refer to the children who are not eating ice-cream: In its narrower definition, an anaphoric pronoun must refer to some noun (phrase) that has already been introduced into the discourse. In complement anaphora cases, however, the anaphor refers to something that

1380-400: The less specific a Queen , but also not the more specific Queen Elizabeth ) must be determined by the context of the utterance, which would identify the identity of the queen in question. Until further revealed by additional contextual words, gestures, images or other media , a listener would not even know what monarchy or historical period is being discussed, and even after hearing her name

1426-506: The major concerns with the Turing test is that a machine could easily pass the test with brute force and/or trickery, rather than true intelligence. The Winograd schema challenge was proposed in 2012 in part to ameliorate the problems that came to light with the nature of the programs that performed well on the test. Turing's original proposal was what he called the imitation game , which involves free-flowing, unrestricted conversations in English between human judges and computer programs over

Winograd schema challenge - Misplaced Pages Continue

1472-405: The predicate drank selects an object argument that is a liquid or is liquid-like. A building cannot normally be understood as wilting, just as a car cannot normally be interpreted as a liquid. The b-sentences are possible only given an unusual context that establishes appropriate metaphorical meaning. The deviance of the b-sentences is addressed in terms of selection. The selectional restrictions of

1518-399: The predicates is wilting and drank are violated. When a mismatch between a selector and a selected element triggers reinterpretation of the meaning of those elements, that process is referred to as coercion . One sometimes encounters the terms s(emantic)-selection and c(ategory)-selection . The concept of c-selection overlaps to an extent with subcategorization. Predicates c-select

1564-648: The previous utterance realised in the current utterance. Selection (linguistics) In linguistics, selection denotes the ability of predicates to determine the semantic content of their arguments . Predicates select their arguments, which means they limit the semantic content of their arguments. One sometimes draws a distinction between types of selection; one acknowledges both s(emantic)-selection and c(ategory)-selection . Selection in general stands in contrast to subcategorization : predicates both select and subcategorize for their complement arguments, whereas they only select their subject arguments. Selection

1610-714: The prize in 2016 and the 2018 competition was cancelled for lack of prospects; the prize is no longer offered. The Twelfth International Symposium on the Logical Formalizations of Commonsense Reasoning was held on March 23–25, 2015 at the AAAI Spring Symposium Series at Stanford University, with a special focus on the Winograd schema challenge. The organizing committee included Leora Morgenstern ( Leidos ), Theodore Patkos (The Foundation for Research & Technology Hellas), and Robert Sloan ( University of Illinois at Chicago ). The 2016 Winograd Schema Challenge

1656-422: The problem in a standardized form which includes the answer choices, thus making it a binary decision problem. The Winograd schema challenge has the following purported advantages: One difficulty with the Winograd schema challenge is the development of the questions. They need to be carefully tailored to ensure that they require commonsense reasoning to solve. For example, Levesque gives the following example of

1702-435: The s-selectional restrictions of the predicates is wilting and drank are violated in the b-sentences. Selectional constraints or selectional preferences describe the degree of s-selection, in contrast to selectional restrictions which treat s-selection as a binary, yes or no. Selectional preferences have often been used as a source of linguistic information in natural language processing applications. Thematic fit

1748-554: The sentence Sally arrived, but nobody saw her , the pronoun her is an anaphor, referring back to the antecedent Sally . In the sentence Before her arrival, nobody saw Sally , the pronoun her refers forward to the postcedent Sally , so her is now a cataphor (and an anaphor in the broader, but not the narrower, sense). Usually, an anaphoric expression is a pro-form or some other kind of deictic (contextually dependent) expression. Both anaphora and cataphora are species of endophora , referring to something mentioned elsewhere in

1794-409: The set of non-ice-cream-eating-children) or to the maximal set (i.e. to all the children, both ice-cream-eating-children and non-ice-cream-eating-children) or some hybrid or variant set, including potentially one of those noted to the right of example b. The various possible referents in complement anaphora are discussed by Corblin (1996), Kibble (1997), and Nouwen (2003). Resolving complement anaphora

1840-411: The status of the subject argument. Traditionally, predicates are interpreted as NOT subcategorizing for their subject argument because the subject argument appears outside of the minimal VP containing the predicate. Predicates do, however, c-select their subject arguments, e.g. The predicate eats c-selects both its subject argument Fred and its object argument beans , but as far as subcategorization

1886-487: The term anaphor in this narrow sense is unique to generative grammar, and in particular, to the traditional binding theory. This theory investigates the syntactic relationship that can or must hold between a given pro-form and its antecedent (or postcedent). In this respect, anaphors (reflexive and reciprocal pronouns) behave very differently from, for instance, personal pronouns. In some cases, anaphora may refer not to its usual antecedent, but to its complement set. In

Winograd schema challenge - Misplaced Pages Continue

1932-402: The use of reasoning, or any understanding of the sentences' meaning—all that is necessary is data on the selectional restrictions of pregnant and carcinogenic. In 2016 and 2018, Nuance Communications sponsored a competition, offering a grand prize of $ 25,000 for the top scorer above 90% (for comparison, humans correctly answer to 92–96% of WSC questions). However, nobody came close to winning

1978-431: The words in the sentence. The first cited example of a Winograd schema (and the reason for their name) is due to Terry Winograd : The city councilmen refused the demonstrators a permit because they [feared/advocated] violence. The choices of "feared" and "advocated" turn the schema into its two instances: The city councilmen refused the demonstrators a permit because they feared violence. The city councilmen refused

2024-478: Was designed in 2019. This dataset consists of fill-in-the-blank style sentences, as opposed to the pronoun format of previous datasets. A version of the Winograd schema challenge is one part of the GLUE ( General Language Understanding Evaluation ) benchmark collection of challenges in automated natural-language understanding . Anaphora (linguistics) In linguistics , anaphora ( / ə ˈ n æ f ər ə / )

2070-551: Was proposed in the spirit of the Turing test . Proposed by Alan Turing in 1950, the Turing test plays a central role in the philosophy of artificial intelligence . Turing proposed that, instead of debating whether a machine can think, the science of AI should be concerned with demonstrating intelligent behavior, which can be tested. But the exact nature of the test Turing proposed has come under scrutiny, especially since an AI chatbot named Eugene Goostman claimed to pass it in 2014. One of

2116-453: Was run on July 11, 2016 at IJCAI-16. There were four contestants. The first round of the contest was to solve PDPs—pronoun disambiguation problems, adapted from literary sources, not constructed as pairs of sentences. The highest score achieved was 58% correct, by Quan Liu et al, of the University of Science and Technology, China. Hence, by the rules of that challenge, no prizes were awarded, and

#110889