Application Of Data Mining Techniques In Pharmacovigilance
Aim
To discuss the potential use of data mining and knowledge discovery in databases for detection of adverse drug events (ADE) in pharmacovigilance.
Methods
A literature search was conducted to identify articles, which contained details of data mining, signal generation or knowledge discovery in relation to adverse drug reactions or pharmacovigilance in medical databases.
Results
ADEs are common and result in significant mortality, and despite existing systems drugs have been withdrawn due to ADEs many years after licensing. Knowledge discovery in databases (KDD) is a technique which may be used to detect potential ADEs more efficiently. KDD involves the selection of data variables and databases, data preprocessing, data mining and data interpretation and utilization.
Data mining encompasses several statistical techniques including cluster analysis, link analysis, deviation detection and disproportionality assessment which can be utilized to determine the presence of and to assess the strength of ADE signals. Currently the only data mining methods to be used in pharmacovigilance are those of disproportionality, such as the Proportional Reporting Ratio and Information Component, which have been used to analyze the UK Yellow Card Scheme spontaneous reporting database and the WHO Uppsala Monitoring Centre database. The association of pericarditis with precool but not with other β-blockers, the association of captopril and other angiotensin-converting enzymes with cough, and the association of terfenadine with heart rate and rhythm disorders could be identified by mining the WHO database.
Approach
We will define knowledge discovery in databases (KDD) as the process of extracting previously unknown, valid and actionable information from large information sources or databases. The process requires a definition of the project goals, dataset acquisition, data cleaning and preprocessing, data mining, data interpretation and utilization.
Data Mining Techniques
- Predictive Modelling
- Clustering or database segmentation
- Link Analysis
- Deviation detection
Predictive modelling is a technique used to develop a model to relate a dependent variable with a set of independent variables in a manner like multiple regression analysis.
Clustering uses an algorithm that segregates a database by evaluating the dissimilarity between records. Pairs of records are compared by the values of the individual fields within them, and clustering into groups provides fast and effective ordering in large datasets.
Link analysis refers to methods that identify associations or links between records or sets of data.
Deviation detection looks for outliers or values that deviate from the norm and can be seen either graphically or statistically. Visualization techniques are used to determine patterns hidden in data, e. g. scatter plots or histograms.
Disadvantages of Techniques
KDD process are not able to account for inaccurate or missing data, and if a signal is not detected it is impossible to determine whether no ADE exists, or the data are insufficient. Furthermore, KDD only generates a signal and, in the context of pharmacovigilance, further studies or investigations will be required to confirm a potential ADE.
Advantages
It is possible to search for many different ADEs at once. Also, information in many databases is under-utilized, and therefore KDD may be possible to generate new information from existing data sources at minimal extra cost. KDD will not replace traditional methods of pharmacovigilance, but if used in conjunction may reduce the time required for ADE identification.