Pharma R&D Today

Ideas and Insight supporting all stages of Drug Discovery & Development

Select category
Search this blog

Text Mining for Pharmacovigilance Purposes

Posted on July 7th, 2016 by in Pharmacovigilance


Text mining, or text data mining, is the process of retrieving relevant information from large amounts of ‘unstructured’ text with the help of automated pattern learning.

High-Quality Information Text Mining provides a technique to transform data from free-text into high-quality, relevant information. The text mining process usually starts with structuring the input (free)-text, followed by finding the statistical patterns within the structured data, and completed by evaluating and interpreting the output that is generated. High-Quality information in text mining includes a combination of relevance, unexpectedness and interest.

The high-quality, relevant information is typically retrieved by recognizing and analyzing statistical patterns in free text. Typical text mining steps are:

  • Text classification and text clustering
  • Concept/entity extraction
  • Producing granular taxonomies
  • Sentiment analysis
  • Document summarization
  • Learning relations between the entities

The overarching goal of text mining is, essentially, to turn text into data for analysis, via application of natural language processing (NLP) and analytical methods.

Typical text mining applications scan a defined set of documents in natural language and model the documents for prediction of classification or compose a database with the retrieved information.

Text mining for pharmacovigilance

Pharmacovigilance is the process and science of monitoring adverse drug events and other drug-related problems and taking actions to minimize the risks and increase the effectiveness of the medicinal products. Collecting and analyzing adverse drug reactions from several sources is the corner-stone of pharmacovigilance activities. Traditionally pharmacovigilance relied mainly on collecting and analyzing spontaneous reports of adverse drug reactions reported by health care professionals. The analysis of these case reports is performed by experts and is based mainly on manual case-by-case analysis. In recent years, drug algorithms and computerized systems are introduced as tools to identify disproportional adverse drug reactions in the spontaneous reporting systems and clinical trial databases.

Mining adverse drug reactions from free-text sources

Due to extensive underreporting of adverse drug reactions by health care professionals, together with a growing public concern about the safe use of medicines, interest in other sources of adverse drug reactions is growing.

In the last decade, research in pharmacovigilance focused on the secondary use of information from electronic health records for safety purposes. More recently, other sources like social media, internet search history, biomedical literature and product information documents are being investigated to support holistic pharmacovigilance.

Due to the large volumes of unstructured data in these sources retrieving information from them can be complicated. Especially as these sources are not designed to collect safety information. Text mining is necessary to retrieve and leverage these data sources for pharmacovigilance purposes. Harpaz et al provide an overview of the recent experiences with text mining several sources for pharmacovigilance purposes[1]

Each of the sources has its unique features and challenges. The biggest challenge of using text mining sources like social media, electronic health records, product information documents and internet search history is that a large proportion of the information is stored as free-text. Free-text data is unstructured, and mining this free-text is complicated by the variability of natural language and therefore challenging to analyze. One of the major challenges is the limited access of several of the sources like electronic health records, internet search history, and social media platforms.

With regard to the use of text mining to retrieve adverse drug events from biomedical literature, research is ambiguous about whether to extract the adverse drug events from the indexing terms or from the abstracts. There are no studies that directly compare both techniques, however it is demonstrated that both techniques lead to improvement of identifying adverse drug event relationships.

The goal of pharmacovigilance is to identify adverse drug events as soon and accurately as possible. It is very important to realize how the different data sources will contribute to this goal. Procedures and techniques for combining safety information extracted from different sources with text mining need to be developed.

[1] Harpaz et al. Text mining for adverse drug events: the Promise, Challenges and State of the Art. Drug Saf. 2014 October; 37 (10): 777-790

R&D Solutions for Pharma & Life Sciences

We're happy to discuss your needs and show you how Elsevier's Solution can help.

Contact Sales