Effective Spanish Encoding Function During Data Matching Process
Many work analysis and algorithms are put forward by researcher using phonetic algorithms in various sectors. Author Mar´ıa del Pilar Angeles, Adri´an Espino-Gamez presented a paper which aimed to help native spanish speakers to identify an open and effective spanish encoding function during data matching process .They have made improvement and enhancement of the spanish phonetic algorithm. They have carried out an evaluation of data matching considering Spanish Phonetic Soundex, Soundex and Phonex in terms of precision-recall and f-measure.And they have suggested a “ Modified Spanish Phonetic” Soundex function, that has a better performance in terms of precision, f-measure. In this work author Kennetth H. Lai , Maxim Topaz described the development of a spelling correction system for medical text.Their spell checker is based on Shannon’s noisy channel model and used an extensive dictionary compiled from many sources.
They have applied their spell checker to three different types of free-text data: clinical notes ,allergy entries and medication orders.After evaluating its performance on both misspelling detection and correction.Their Spell checker achieved detection performance up to 94.4% and correction accuracy up to 88.2%. Shaun J.Grannis , J. Marc Overhage from the Regenstrief Institute for Health Care and Indiana School of Medicine presented a work [24] where they have studied different name comparison methods that establish agreement or disagreement between corresponding names.
They have tested three approximate string comparators including modified Jaro-Winkler method, the longest common substring and the Levenshtein edit distance. Jaro-Winkler comparator achieved the highest linkage sensitivity of 97%. Chakkrit Snae in his work described name variations and some basic description of various name matching algorithms developed to overcome name variation and to find reasonable variants of names which can be used to further increasing mismatches for record linkage and name search.
His implementation contains algorithms for computing a range of fuzzy matching based on different types of algorithms.Different name matching method (e.g. NYSIIS, Guth, Levenshtein, Soundex, Metaphone, and Phonex) is applied and measured accuracy.NYSIIS,phonex have been shown to perform well and provide sufficient flexibility to be included in the linkage/matching process form optimising name searching. Phonex algorithm is a combination of the two methods, Soundex and Metaphone.
The method was proved to give a good overall performance when applied to names in the English language . Guth algorithm is based on the approach due to Guth. The method is left to right sequence driven, and is essentially alphabetic but is independent of language and ethnic issues. It is straightforward to code, is portable, and gives reliable results. It is, however, weak when comparing short names Special Algorithm For Bangla Although Soundex, metaphone algorithm shows a high accuracy for generating same code for different representation of a single English name, but these algorithm often fails to generate accurate code for Bangla names. To solve this problem there has been significant amount of work in our country.
Here are some of the notable work listed below. Application of phonetic encoding for analyzing similarity of Patient's Data: Bangladesh Perspective In this work, authors - Abir Bin Ayub Khan, Mohammad Sheikh Ghazanfar, Shahidul Islam Khan proposed a new algorithm names “Modified Name Significance Algorithm ”which was a modified version of “Name Significance Algorithm”.Their algorithm performed better than existing solutions like - NameSignificance, Soundex, Double metaphone encoding. A double metaphone encoding for approximate name searching and matching in Bangla Here in this work authors - Naushad UzZaman and Dr. Mumit Khan proposed a Double metaphone encoding technique for Bangla name searching and matching application.
This encoding encapsulates the complex spelling rules for Bangla, and in addition, takes into account the special cases for names. But they didn't implement this method in real dataset, therefore there was no mention of result part. A Bangla Phonetic Encoding for Better Spelling Suggestions Here in this paper, authors - Naushad UzZaman and Dr. Mumit Khan presented a phonetic encoding for Bangla that can be used by spelling checkers to provide better suggestions for misspelled words. The encoding is based on the soundex algorithm. But it was all theoretical,authors did not mention any practical result for their work.