Disease Inference System-Comparison Of Different
Abstract
Health plays an important role in ones happinessand well being. Automatic disease inference is mainly used tobridge the gap between what online health seekers with unusualsymptoms need and what busy doctors with biased expertisecan offer. One of the main challenges in health care sector is topredict diseases by given symptoms. A disease inference systemanalyze and categorize needs of health seekers and ask formanifested symptoms. Basic steps for implementing the system isdata preparation and processing,feature extraction, and applying. Deep learning which predicts the corresponding disease. In thispaper we make a comparison of different methods which can beused to implement the disease inference system.
INTRODUCTION
Online health information includes both medical resourcesand patient community connections. it plays an importantrole in patient education and self care. it is seen that from anational consumer survey it shows that the average consumersspends nearly 52 hours looking for health information onthe internet. Online health resources can be categorized intotwo categories. One is the reputable portals run by officialsectors,organizations etc. They provide accurate,well structuredhealth knowledge on various topics.
WebMD and Medline
Plusare some of the examples. The next category is the communitybased health services. A Community-based health care serviceis meant for people of all ages who need health care assistanceat home. Community-based health services are best thoughtof as a sub-system of the overall health system. HealthTapand HaoDF are some of the examples of community basedhealth services. These systems provide a interactive platform,where health seekers can ask the questions by providing thesymptoms while doctors can provide corresponding answersto the queries. There are some limitations to community basedhealth care services. First,it is time consuming for healthseekers to get their answers. Second,doctors have to cope upwith heavy workload which reduce the efficiency. Third,sincereplies are conditioned on doctors expertise,conflicts amongmultiple doctors may occur. The biggest stumbling block of automatic health system isDisease Inference. Generally people search for:
- Supplementclues of their diseases
- Preventive information
- Possible
Diseases by their manifested symptoms. The first two involvethe exact disease name and it can be automatically answeredby matching the questions in the achieved repositories fromthe structured health portals. The last one seeks to predictthe disease from manifested symptoms. A robust diseasesinference approach is the key to break the barrier of automaticwellness systems. Disease inference is a reasoning consequences based onthe given question,this task will be difficult due to followingreasons. First, Vocabulary gap makes the data inconsistent. Forexample,”shortness of breath and breathless were used bydifferent health seekers to refer the same disease nameddyspnea. Second,health seekers gives their query in anincomplete manner ie they describe the problem in shortquestions. This factors limits the performance that can beobtained by generating shallow learning methods. Shallowlearning methods include SVM,Decision Tree. In shallowlearning methods,output of learning scheme is directlyfollowed by a classifier as if the system has only one layer. In this paper,comparison among six approaches are stud-ied. Six approaches are SVM, KNN, MTSVM, DASVM,SASR, SCDL.
METHODS
Different Methods discussed in this paper are SupportVector Machine, KNN, Multi-Switch Transductive SVM, De-terministic Annealing Semi-Supervised SVM, Stacked AutoEncoder- Softmax Regression, Sparsely Connected DeepLearning. A. Support Vector MachineA Support Vector Machine (SVM) is a discriminativeclassifier formally defined by a separating hyper plane. Ahyperplane is a subspace whose dimension is one lessthan that of its ambient space. It is a supervised learningmethod. Given labeled training data, the algorithm outputs anoptimal hyper plane which categorizes new examples.
Stacked Auto Encoder-Softmax Regression
It is an unsupervised learning algorithm which uses backpropagation and constrains the target values to approximateinputs. Back propagation is a method used in artificialneural networks to calculate a gradient which is usedin the calculation of the weights. Autoencoder learns arepresentation (encoding) for a set of data, typically fordimensionality reduction. Dimensionality Reduction is theprocess of reducing the number of random variables byobtaining a set of principal variables. It can be divided intofeature selection and feature extraction. Feature Selection triesto find a subset of the original variables. Feature extractionbuilds derived values (features) intended to be informativeand non-redundant. Here we consider three hidden layers with random initial-ization incrementally. Softmax classifier is chosen as he outputlayer. Softmax classifiers give you probabilities for each classlabel while hinge loss gives you the margin. It is a simplemethod since it only interpret probabilities rather than marginscores. This architecture is considered to be fully connected.
F. Sparsely Connected Deep LearningIn Sparsely connected Deep learning, consider L layers withdl nodes in each layer. First layer contains the input and the lthlayer contains the output. Intermediate layers are hidden layerswhich are unseen from the data. Here nodes in the higher layerconnect to the nodes in the adjacent lower layer,rather thanfully connected. In this architecture the last hidden layer andoutput layer are fully connected.
COMPARISON
Data is collected from EveryoneHealthy, WebMedand MedlinePlus. It contains Question Answer pairs. Forcomparison we filter the whole dataset and consider certainclasses of diseases. These class of diseases contains onlyquestions where samples and tags were utilized to extract thedisease names. These methods are applied to this dataset. MTSVM and DASVM support binary classification iesupport two values either true or false. SASR and SDCLsignificantly outperform the other two supervised learningalgorithms. But results SDCL shows more accuracy and showsignificant performance than SASR,this is due to the nodenumber in the hidden layer hence it will be hard to obtainoptimal result.
Approaches Performance on Dataset
SVM 77. 01%KNN 76. 65%MTSVM 84. 54%DASVM 86. 44%SASR 89. 27%SCDL 91. 48%After showing the basic comparison we got SDCL showsmore accuracy compared to different approaches. Now a com-parison of hoe much hidden layers are also studied.
Number of Layers Performance on Dataset
Structure with One hidden layer 89. 00%Structure with Two hidden layer 93. 13%Structure with Three hidden layer 98. 21%Here we incrementally added hidden layers between theinput and output layer until it satisfies the convergent criteriaand found that three hidden layers are used to enhance thesystem. Convergent criterion is defined as the accuracy of deepleaning model with n hidden layers.
CONCLUSION
This paper performs study of comparison of six approacheswhich can be used to implement disease inference system. Itshows that Sparsely Connected Deep learning architecture withthree hidden layers shows the best accurate results comparedto other approaches. Therefore it is generalizable and scalable. Classical deep learning architectures are densely connectedand the node number in each hidden layers are adjusted. Incontrast it sparsely connected deep learning with improvedefficiency and the number of hidden nodes is determined.