Machine Translation Using Example-Based Machine Translation

Machine translation is one of the research areas under computational linguistics. Various methods have been proposed to automate the translation process. The thesis proposes a Machine Translation system for transaction from English to Malayalam language. The translation system is based on Example Based Machine Translation (EBMT) approach. The input to the translation system is English sentence and the corresponding Malayalam sentence is generated as output. Example Based machine translation is based on the idea of reusing the already translated examples.

Introduction

Natural-language processing (NLP) is an area of computer science and artificial intelligence concerned with the interactions between computers and human languages, in particular how to program computers to fruitfully process large amounts of natural language data. Challenges in natural-language processing frequently involve speech recognition, natural-language understanding, and natural-language generation. The advent of the World Wide Web has greatly increased the demand for software that process text of all kinds. Over the last ten years, Internet publishing has become a common place activity for private individuals, commercial enterprises, and government organization, as well as traditional media companies. Mobile information accessories such as laptop, cellular phones and personal digital assistance, have become ubiquitous, supporting instant messaging.

The language is a powerful medium for the communication that conveys the ideas and expression of the human mind. There are more than 5000 languages in the world for the communication. To know all these languages are not a solution for problems due to the language barrier in communication. In this multilingual world with the huge amount of information exchanged between various regions and in different languages in digitized format, it has become necessary to find an automated process to convert from one language to another. Natural Language Processing (NLP) is one of the hot area of research that explores how computers can be utilize to understand and manipulate natural language text or speech. There are many applications of NLP such as machine translation, cross language information retrieval (CLIR), speech recognition, and artificial intelligence and so on. Natural language processing is an ongoing challenging and complex research topic. Machine Translation is the process of enabling a computer to translate sentences from one language to another.

There is lot of research going on this area of machine translation at present. Many works are concentrated on the translation of English to Indian regional languages. Moreover, translating Indian regional languages to English is also important for many applications. In Kerala, most of the population is not so familiar with English. English is an international language and one of the popular spoken languages in the world. The translation of the native language into a commonly used language is essential for many applications like instant message systems and for all communication systems. The thesis proposes a translation system which translates an English sentence into its corresponding Malayalam sentence. There are many different approaches to the machine translation. Example based machine translation (EBMT) is one of the dominant type of machine translation. It is a corpus based approach, based on the idea of translation by analogy. Objective of this thesis is to build an example based machine translation system which translates sentences in English language into the Malayalam language. Input to this translation system is an English sentence and output its corresponding Malayalam sentence.

Literature Review

According to Mosleh H Al-Adhaileh and Tang Enya Kong et al.[1], Proposed an Example-Based Machine Translation (EBMT) system for English Malay translation. This approach is an example based approach which relies sorely on example translations kept in a Bilingual Knowledge Bank (BKB). In this approach, a flexible annotation schema called Structured String-Tree Correspondence (SSTC) is used to annotate both the source and target sentences of a translation pair. Each SSTC describes a sentence, a representation tree as well as the correspondences between substrings in the sentence and subtrees in the representation tree. With both the source and target SSTCs established, a translation example in the BKB can then be represented effectively in terms of a pair of synchronous SSTCs. In the process of translation, they first try to build the representation tree for the source sentence (English) based on the example-based parsing algorithm. By referring to the resultant source parse tree, we then proceed to synthesis the target sentence (Malay) based on the target SSTCs as pointed to by the synchronous SSTCs which encode the relationship between source and target SSTCs.

According to Osamu FURUSE and Hitoshi IIDA et al.[2] proposes an effective parsing method for example-based machine translation. In this method, an input string is parsed by the top down application of linguistic patterns consisting of variables and constituent boundaries. A constituent boundary is expressed by either a functional word or a part-of speech bigram. When structural ambiguity occurs, the most plausible structure is selected using, the total values of distance calculations in the example-based framework. Transfer-Driven Machine Translation (TDMT) achieves efficient and robust translation within the example-based framework by adopting this parsing method. Using bi- directional translation between Japanese and English, the effectiveness of this method in TDMT is also shown. Tantely Andriamanankasina, Kenji Araki and Koji Tochinai et al., a translation method which recursively divides a sentence and translate each part separately.

A translation method which is based on a recursive division of the input sentence and translate each part independently. The system predicts the position where the input sentence or a segment of the input sentence should be divided according to the links existing between the source sentence and the target sentence of the extracted translation example. In addition, they evaluate an analogy-based word-level alignment method which predicts word correspondences between source and translation sentences of new translation examples. The translation method was implemented in a French-Japanese machine translation system and spoken language text were used as examples. Promising translation results were earned and the effectiveness of the alignment method in the translation was confirmed.

Ralf D. Brown et al. proposed a method in the Pangloss Example-Based Machine Translation engine (PanEBMT) is a translation system requiring essentially no knowledge of the structure of a language, truly a large parallel corpus of example sentences and a bilingual dictionary. The Input texts are subdivided into sequences of words occurring in the corpus, for which translations are determined by sub-sentential alignment of the sentence pairs containing those sequences. These partial translations are then combined with the results of other translation engines to form the final translation produced by the Pangloss system. According to Michael Carl et al. describes an example-based machine translation (EBMT) system which relays on various knowledge resources. Morphologic analyses abstract the surface forms of the languages to be translated. A shallow syntactic rule formalism is used to percolate features in derivation trees. Translation examples serve the decomposition of the text to be translated and determine the transfer of lexical values into the target language. Translation templates determine the word order of the target language and the type of phrases (e.g. noun phrase, prepositional phase, .. .) to be generated in the target language. An induction mechanism generalizes translation templates from translation examples. The paper outlines the basic idea underlying the EBMT system and investigates the possibilities and limits of the translation template induction process.

Theoretical Background

In the beginning of 90s Machine Translation researchers were attracted to statistical and example based machine translation. EBMT became an established field for research in Machine translation by 1993. The great demand for high-quality automatic translation made almost all the researchers move towards the corpus-based machine translation. There are so many works done in this machine translation technique for different language pairs in the world. The section describes the various approaches for machine translation and works related to each approach.

Rule-based machine translation (RBMT) uses linguistic rules to analyze the input sentence in source language to generate text in the target language. RBMT mathematically break down the source and target languages using linguistic information. So it is more predictable and grammatically superior than other methods. An RBMT system generates output sentences on the basis of morphological, syntactic, and semantic analysis of both the source and target languages involved in translation task. Since 1989, corpus based approach for machine translation has emerged as one of the widely explored area in machine translation. Corpus based machine translation is a type of translation which over comes the knowledge acquisition problem of RBMT. It uses a huge bilingual parallel corpus to obtain knowledge for the translation. This translation system transforms source language into form that depends only on the target language. Output text is generated from this intermediate form. CBMT system are classified into Statistical Machine Translation-SMT and Example Based Machine Translation-EBMT Statistical machine translation is an entirely different approach which never creates such complex linguistics rules like RBMT system. Statistical machine translation is a data-driven machine learning method based on huge bilingual corpora. Google Translate is an example of Statistical Machine Translation (SMT). The advantage of SMT system is that linguistic knowledge is not required for building them. The difficulty in SMT system is creating massive parallel corpus.

Example Based Machine Translation

Example acquisition

Example acquisition is the process of acquiring examples of already translated sentences and to form a parallel corpus for the translation system. Corpus is the collection of the examples from various resources. In this work, corpus not only contain examples in the sentence level but also in various more interesting levels such as sub-sentential levels including words, idioms and collocations, multi-word terminology, and phrases. The work mostly uses idioms, multi words and phrases in Malayalam language and its corresponding translation.

Matching

Matching phase is the one of the major steps in Example Based Machine Translation. Corpus is searched for finding out the best matching for the input source sentence. Matching is concerned with find out the matching fragments in the corpus against the input sentence. Also it deals with how these stored examples are used for the translation. Sometimes it is very difficult for the system to translate a full sentence in itself. Then the input sentence undergoes splitting and creates a set of smaller fragments. We need to solve the problem of segmenting a given input sentence in case of no translation available in corpus as complete sentence. In this case, we first look at the example database(corpus) and find out the longest possible fragment available in the corpus and select the corresponding translated fragment. Then, we consider the remaining part of the input sentence for which the next matching fragment has to be found from the corpus. This process will continue till the end of input sentence. If the system does not have extensive corpus, matching process may not be successful Let S be the input Malayalam sentence decomposed in to small fragments called examples (e1,e2,e3...) D(S)=e1,e2,e3,.. Where D indicates the decomposition.

Then we look up in the corpus to get the translated fragment for each example T(e1)=t1 T(e2)=t2 T(e3)=t3 Where T indicates the translation and t1, t2, t3 are the English fragments corresponding to e1,e2,e3 respectively. 3.3 Recombination This is the final step in the example based machine translation. Recombination or sentence synthesis is the process of combining the translated fragments in to target text. Hence the recombination generates the target translated sentence and enhances the readability of the target sentence. Combining these translated chunks into a well formed structure in the target language is the most difficult step in EBMT. But it has received always less attention than all the other steps in translation.

The translation system completely relies on the corpus that contains examples of already translated words, phrases and sentence. System performance can be improved by a large aligned corpus and more rules for reordering. Input English Sentence Output Malayalam Sentence We bought book ഞങ്ങള് പുസ്തകം വാങ്ങി I do not like milk എനിക്ക് പാല് ഇഷ്ടമല്ല We will win game ഞാന് കളി ജയിക്ും Does she came ? അവൾ വരാറുഡണ്ടാ ? My father is a doctor എന്ററ അച്ഛന് ഡ ാക്ടറാണ് He had seen the film അവൻ ആ സിനിമ കണ്ടിരുന്നു

Discussion and Conclusion

The essential feature that characterizes a Machine Translation approach and sets it apart from other approaches is the kind of knowledge it uses. From this perspective, we argue that Example-Based Machine Translation is sometimes characterized in terms of nonessential features. We show that Example-Based Machine Translation, as long as it is linguistically principled, significantly overlaps with other linguistically principled approaches to Machine Translation. We make a proposal for translation knowledge bases that make such an overlap explicit. We relate our proposal to translation by analogy, which stands out as an inherently example-based technique. As per the discussion in the literature review there is no working English to Malayalam translation using example based method. It can be done in future.

11 February 2020
close
Your Email

By clicking “Send”, you agree to our Terms of service and  Privacy statement. We will occasionally send you account related emails.

close thanks-icon
Thanks!

Your essay sample has been sent.

Order now
exit-popup-close
exit-popup-image
Still can’t find what you need?

Order custom paper and save your time
for priority classes!

Order paper now