Rule-based machine translation

Rule-based machine translation ( RBMT ; "Classical Approach" of MT) is machine translation systems based on linguistic information about source and target languages basically retrieved from (unilingual, bilingual or multilingual) dictionaries and grammars covering the main semantic, morphological, and syntactic regularities of each language respectively. Having input sentences (in some source language), an RBMT system generates them to output sentences (in some target language) on the basis of morphological, syntactic, and semantic analysis of both the source and the target languages involved in a concrete translation task. RBMT has been progressively superseded by more efficient methods, particularly neural machine translation .

#513486

29-550: The first RBMT systems were developed in the early 1970s. The most important steps of this evolution were the emergence of the following RBMT systems: Today, other common RBMT systems include: There are three different types of rule-based machine translation systems: RBMT systems can also be characterized as the systems opposite to Example-based Systems of Machine Translation ( Example Based Machine Translation ), whereas Hybrid Machine Translations Systems make use of many principles derived from RBMT. The main approach of RBMT systems

58-427: A lexicon. In NLP , ontologies can be used as a source of knowledge for machine translation systems. With access to a large knowledge base, rule-based systems can be enabled to resolve many (especially lexical) ambiguities on their own. In the following classic examples, as humans, we are able to interpret the prepositional phrase according to the context because we use our world knowledge, stored in our lexicons: I saw

87-632: A man/star/molecule with a microscope/telescope/binoculars. Since the syntax does not change, a traditional rule-based machine translation system may not be able to differentiate between the meanings. With a large enough ontology as a source of knowledge however, the possible interpretations of ambiguous words in a specific context can be reduced. The ontology generated for the PANGLOSS knowledge-based machine translation system in 1993 may serve as an example of how an ontology for NLP purposes can be compiled: The RBMT system contains: The RBMT system makes use of

116-498: A phrasal verb, but only when the combination of verb and preposition is not intuitive to the learner: Further examples: Sometimes both phenomena can occur in the same context. In general, the discrete meanings associated with phrasal verbs cannot be readily understood solely by construing the sum of their respective parts: the meaning of pick up is distinct from the various meanings of pick and up , and may acquire disparate meanings depending on its contextual usage. Similarly,

145-408: A preposition and must be particle. But even with a particle verb, shifting the particle is not always possible, for example if it is followed by a pronoun instead of a noun, or if there is a fixed collocation. A second diagnostic is to think about where the instinctive division would be if we had to take a breath in the middle of the phrase. A particle would naturally be grouped with the preceding verb,

174-401: A preposition with the following noun phrase. In the following examples, which show both of these approaches, an asterisk indicates an impossible form. A third test, which probes further into the question of the natural division, would be to insert an adverb or adverbial between the verb and the particle/preposition. This is possible with a following prepositonal phrase, but not if the adverbial

203-414: A prepositional phrase can complement a particle verb, some explanations distinguish three types of phrasal verb constructions depending on whether the verb combines with a particle, a preposition phrase, or both, though the third type is not a distinct linguistic phenomenon. Finally, some linguists reject the term altogether. Particle verbs (phrasal verbs in the strict sense) are two-word verbs composed of

232-541: A simple verb and a particle extension that modifies its meaning. The particle is thus integrally collocated with the verb. In older grammars, the particle was usually analyzed as an adverb. In these examples, the common verbs grow and give are complemented by the particles up and in . The resulting two-word verbs are single semantic units, so grow up and give in are listed as discrete entries in modern dictionaries. These verbs can be transitive or intransitive. If they are transitive, i.e. if they have an object ,

261-436: A verb followed by an adverb and/or a preposition , which are called the particle to the verb. Phrasal verbs produce specialized context-specific meanings that may not be derived from the meaning of the constituents. There is almost always an ambiguity during word-to-word translation from source to the target language. As an example, consider the phrasal verb "put on" and its Hindustani translation. It may be used in any of

290-423: Is based on linking the structure of the given input sentence with the structure of the demanded output sentence, necessarily preserving their unique meaning. The following example can illustrate the general frame of RBMT: Minimally, to get a German translation of this English sentence one needs: And finally, we need rules according to which one can relate these two structures together. Accordingly, we can state

319-401: Is distinct, and modern dictionaries may list, for example, to (particle) and to (preposition) as separate lexemes. In the particle verb construction, they cannot be construed as prepositions because they are not being used as part of a prepositional phrase . Many verbs can be complemented by a prepositional phrase that functions adverbially: This construction is sometimes also taught as

SECTION 10

#1732854760514

348-604: Is encoded to example-based machine translation through the example translations that are used to train such a system. Other approaches to machine translation, including statistical machine translation , also use bilingual corpora to learn the process of translation. Example-based machine translation was first suggested by Makoto Nagao in 1984. He pointed out that it is especially adapted to translation between two totally different languages, such as English and Japanese. In this case, one sentence can be translated into several well-structured sentences in another language, therefore, it

377-468: Is intruding between the two parts of a particle verb. A fourth test would be to place the verb in a w-question ( which? who? ) or a relative clause and consider whether the particle/preposition can be placed before the question word or relative pronoun. While this may sound antiquated, it is always possible with a preposition, never with a particle. (For more on an obsolete prescriptive rule about this, see preposition stranding .) While this distinction

406-414: Is no use to do the deep linguistic analysis characteristic of rule-based machine translation . Example-based machine translation systems are trained from bilingual parallel corpora containing sentence pairs like the example shown in the table above. Sentence pairs contain sentences in one language with their translations into another. The particular example shows an example of a minimal pair , meaning that

435-439: Is of interest to linguists, it is not necessarily important for language learners, and some textbooks recommend learning phrasal verbs as whole collocations without considering types. A complex aspect of phrasal verbs concerns the syntax of particle verbs that are transitive (as discussed and illustrated above). These allow some variability, depending on the relative weight of the constituents involved. Shifting often occurs when

464-749: Is related to the history of particle verbs, which developed out of Old English prefixed verbs. By contrast, compounds which put the particle second are a more modern development in English, and focus more on the action expressed by the compound. Prepositional verbs are very common in many languages, though they would not necessarily be analyzed as a distinct verb type: they are simply verbs followed by prepositional phrases. By contrast, particle verbs are much rarer in cross-language comparison, and their origins need some explanation. Middle English particle verbs developed from Old English prefixed verbs: OE inngan > English go in . English phrasal verbs are related to

493-572: The OED editor Henry Bradley suggested it to him. This terminology is mainly used in English as a second language teaching. Some textbooks apply the term "phrasal verb" primarily to verbs with particles in order to distinguish phrasal verbs from verb phrases composed of a verb and a collocated preposition. Others include verbs with prepositions under the same category and distinguish particle verbs and prepositional verbs as two types of phrasal verbs. Since

522-415: The concept of phrasal verb occurs via compounding when a verb+particle complex is nominalized . The particles may come before or after the verb. If it comes after, there may be a hyphen between the two parts of the compound noun. Compounds which place the particle before the verb are of ancient development, and are common to all Germanic languages, as well as to Indo-European languages in general. This

551-426: The following stages of translation : Often only partial parsing is sufficient to get to the syntactic structure of the source sentence and to map it onto the structure of the target sentence. An ontology is a formal representation of knowledge that includes the concepts (such as objects, processes etc.) in a domain and some relations between them. If the stored information is of linguistic nature, one can speak of

580-414: The following ways: Phrasal verb In the traditional grammar of Modern English , a phrasal verb typically constitutes a single semantic unit consisting of a verb followed by a particle (e.g., turn down , run into, or sit up ), sometimes collocated with a preposition (e.g., get together with , run out of, or feed off of ). Phrasal verbs ordinarily cannot be understood based upon

609-408: The following: Example-based machine translation Example-based machine translation ( EBMT ) is a method of machine translation often characterized by its use of a bilingual corpus with parallel texts as its main knowledge base at run-time. It is essentially a translation by analogy and can be viewed as an implementation of a case-based reasoning approach to machine learning . At

SECTION 20

#1732854760514

638-618: The foundation of example-based machine translation is the idea of translation by analogy. When applied to the process of human translation, the idea that translation takes place by analogy is a rejection of the idea that people translate sentences by doing deep linguistic analysis. Instead, it is founded on the belief that people translate by first decomposing a sentence into certain phrases, then by translating these phrases, and finally by properly composing these fragments into one long sentence. Phrasal translations are translated by analogy to previous translations. The principle of translation by analogy

667-472: The meaning of hang out is not conspicuously related to a particular definition of hang or out . When a particle verb is transitive , it may be difficult to distinguish it from a prepositional verb. A simple diagnostic which works in many cases is to consider whether it is possible to shift the preposition/particle to after the noun. An English preposition can never follow its noun, so if we can change verb - P - noun to verb - noun - P , then P cannot be

696-520: The meanings of the individual parts alone but must be considered as a whole: the meaning is non- compositional and thus unpredictable. Phrasal verbs are differentiated from other classifications of multi-word verbs and free combinations by the criteria of idiomaticity, replacement by a single verb, wh -question formation and particle movement. The term phrasal verb was popularized by Logan Pearsall Smith in Words and Idioms (1925), in which he states that

725-451: The object is very light, e.g. Shifting occurs between two (or more) sister constituents that appear on the same side of their head . The lighter constituent shifts leftward and the heavier constituent shifts rightward, and this happens to accommodate the relative weight of the two. Dependency grammar trees are again used to illustrate the point: The trees illustrate when shifting can occur. English sentence structures that grow down and to

754-403: The particle may come either before or after the object of the verb. When the object is a pronoun, the particle is usually placed afterwards. With nouns, it is a matter of familiar collocation or of emphasis. Particles commonly used in this construction include to, in, into, out, up, down, at, on, off, under, against. All these words can also be used as prepositions, but the prepositional use

783-412: The right are easier to process. There is a consistent tendency to place heavier constituents to the right, as is evident in the a-trees. Shifting is possible when the resulting structure does not contradict this tendency, as is evident in the b-trees. Note again that the particle verb constructions (in orange) qualify as catenae in both the a- and b-trees. Shifting does not alter this fact. An extension of

812-420: The sentences vary by just one element. These sentences make it simple to learn translations of portions of a sentence. For example, an example-based machine translation system would learn three units of translation from the above example: Composing these units can be used to produce novel translations in the future. For example, if we have been trained using some text containing the sentences: President Kennedy

841-416: Was shot dead during the parade. and The convict escaped on July 15th. , then we could translate the sentence The convict was shot dead during the parade. by substituting the appropriate parts of the sentences. Example-based machine translation is best suited for sub-language phenomena like phrasal verbs . Phrasal verbs have highly context-dependent meanings. They are common in English, where they comprise

#513486