Applications Of Natural Language Processing
One important application of NLP is Machine Translation (MT): “the automatic translation of text…from one natural language to another. ” The existing MT systems are far from perfect; they usually output a buggy translation, which requires human post-edit. These systems are useful only to those people who are familiar enough with the output language to decipher the inaccurate translations. The inaccuracies are in part a result of the imperfect NLP systems. Without the capacity to understand a text, it is difficult to translate it. Many of the difficulties in realizing MT will be resolved when a system to resolve pragmatic, lexical, semantic and syntactic ambiguities of natural languages is developed. One further difficulty in Machine Translation is text alignment. Text alignment is not a part of the language translation process, but it is a process that ensures the correct ordering of ideas within sentences and paragraphs in the output. The reason this is such a difficult task is because text alignment is not a one-to-one correspondence.
Different languages may use entirely different phrases to convey the same message. For example, the French phrases “Quant aux eaux minérals et aux lemonades, elles rencontrent toujours plus d’adeptes. En effet notre sondage fait ressortir de ventes nettement supérieures à celles de 1987, pour les boissons à base de cola notamment” would correspond to the English phrase “ With regard to the mineral waters and the lemonades, they encounter still more users. Indeed our survey makes stand out the sales clearly superior to those in 1987 for cola based drinks especially. ” While this one-to-one correspondence translation is grammatically accurate, the sense of the sentence would be better conveyed if some phrases were rearranged to read “According to our survey, 1988 sales of mineral water and soft drinks were much higher than in 1987, reflecting the growing popularity of these products. Cola drink manufacturers in particular achieved above average growth rates. ” There are currently three approaches to Machine Translation: direct, semantic transfer and inter-lingual. Direct translation entails a word-for-word translation and syntactic analysis. The word-for-word translation is based on the results of a bilingual dictionary query, and syntactical analysis parses the input and regenerates the sentences according to the output language’s syntax rules. For example the phrase “Les oiseaux jaunes” could be accurately translated into “The yellow birds” using this technology. This kind of translation is most common today in commercial systems, such as Altavista. However this approach to MT does not account for semantic ambiguities in translation.
The semantic transfer approach is more advanced than the direct translation method because it involves representing the meaning of sentences and contexts, not just equivalent word substitutions. This approach consists of a set of templates to represent the meaning of words, and a set of correspondence rules that form an association between word meanings and possible syntax structures in the output language. Semantics, as well as syntax and morphology, are considered in this approach. This is useful because different languages use different words to convey the same meaning. In French, the phrase “Il fait chaud” corresponds to “It is hot”, not “It makes hot” as the literal translation would suggest. However, one limitation of this approach is that each system must be tailored for a particular pair of languages.
The third and closest to ideal (thus inherently most difficult) approach to MT is translation via interlingua. “An interlingua is a knowledge representation formalism that is independent of the way particular languages express meaning. ” This approach would form the intermediary step for translation between all languages and enable fluent communication across cultures. This technology, however, greatly depends on the development of a complete NLP system, where all levels of analysis and ambiguities in natural language are resolved in a cohesive nature. This approach to MT is mainly confined to research labs because significant progress has not yet been made to develop accurate translation software for commercial use. Although MT over a large domain is yet unrealized, MT systems over some limited contexts have been almost perfected. This idea of closed context is essentially the same concept used when developing SHRDLU; one can develop a more perfect system by constraining the context of the input. This constraint resolves many ambiguities and difficulties by eliminating them. However, these closed contexts do not necessarily have to be about a certain subject matter (as SHRDLU was confined to the subject of toy blocks) but can also be in the form of controlled language. For example, “at Xerox technical authors are obliged to compose documents in what is called Multinational Customized English, where not only the use of specific terms is laid down, but also are the construction of sentences. ” Hence their MT systems can accurately deal with these texts and the documents can be automatically generated in different languages. Again, before a perfection of MT over an unrestricted domain can be realized, further research and developments must be made in the field of NLP.
Another application that is enabled by NLP is text summarization, the generation of a condensed but comprehensive version of an original human-composed text. This task, like Machine Translation, is difficult because creating an accurate summary depends heavily on first understanding the original material. Text summarization technologies cannot be perfected until machines are able to accurately process language. However, this does not preclude parallel research on the two topics; text summarization systems are based on the existing NLP capabilities.
There are two predominant approaches to summarization: text extraction and text abstraction. Text extraction removes pieces from the original text and concatenates them to form the summary. The extracted pieces must be the topic, or most important, sentences of the text. These sentences can be identified by several different methods. Among the most popular methods are intuition of general paper format (positional importance), identification of cue phrases (e. g. “in conclusion”), and identification of proper nouns. Some extraction systems assume that words used most frequently represent the most important concepts of the text.
These methods are generally successful in topic identification and are used in most commercial summarization software. However these systems operate on the word level rather than the conceptual level and so the summaries will not always be fluent or properly fused pieces. Text abstraction is a less contrived and much more complex system for summarization. While extraction mainly entails topic identification, abstraction involves that, and also interpretation and language generation. These two additional steps would make the automated summary more coherent and cohesive. To date, this approach has not been successful because the interpretation stage of this process (the most difficult part of NLP) needs more development before it can aid in summarization.