Pharma R&D Today

Ideas and Insight supporting all stages of Drug Discovery & Development

Select category
Search this blog

Using Text Mining to Find Treatments for Rare Diseases

Posted on May 5th, 2016 by in Pharma R&D

Findacure UK Cure CHI

When you compare PubMed search results for rare diseases, such as “congenital hyperinsulinism” (767), schwannomatosis (250) or “Mabry syndrome” (34), to that of “diabetes” (542,987), “heart attack” (217,694) or “obesity” (237,307), the disparity in attention becomes much more obvious. Even though the amount of documents is quite different, when it comes to using textual sources for disease modeling and drug repurposing, the issues of handling too much or too little information might be surprisingly similar.

In both cases, it is important to have access to all available information, to be able to find all relevant pieces, put them in context and tie together to generate new hypotheses. Even for rare diseases, where the number of papers is small, doing such analysis solely manually is not simple, as a lot of additional information is needed to fill in missing pieces. The common approach is to work effectively with text and to reduce the amount of manual effort in summarizing published research is text mining.

You can learn more about this topic by attending the webinar Mobilizing Informational Resources for Rare Disease with Elsevier Text Mining on May 19, 2016 10:00 AM EDT. Register now!


For the Findacure project we used Elsevier Text Mining to:

  • Build a library:
    • Find all papers that focus on congenital hyperinsulinism (CHI)
    • Find all papers that mention the effects of sirolimus on insulin sensitivity or resistance
    • Find all proteins, small molecules, diseases and cell processes that are relevant in CHI

CHI library: Text mining for document search

To build a CHI library – an organized collection of documents relevant to CHI, we used several features of Elsevier Test Mining that simplify document search:

  • Search engine powered by dictionaries and taxonomies to ease querying process (dictionaries allow to use a single search term to query all 9 synonyms of congenital hyperinsulinism, and taxonomies allow to name a single category to search for all concepts in this category (for example “protein” for all individual proteins, or “rare disease” for all rare diseases)
  • Finding concepts within a specified distance from each other, for example, the word “focal” next to the term “congenital hyperinsulinism,” to identify and organize documents mentioning particular CHI subtypes
  • Querying meta-data, such as document types and keywords, to organize documents based on study type (case studies, reviews, etc).

If the goal is to find documents mentioning a single concept, keywords are still doing a decent job, but the second task on our list, adding documents that mention sirolimus effects on insulin sensitivity to CHI library, illustrates the drawbacks of the keyword approach: terms “sirolimus” and “insulin sensitivity” may occur in different parts of the document, and not be directly related to each other. Our text mining capability allows searching within “semantic relation,” requiring terms to be connected grammatically.

Documents returned for a semantic query are highly relevant as they contain sentences like:

  • More strikingly, insulin sensitivity was altered with respect to different lengths of rapamycin treatment
  • In agreement, in humans acute rapamycin-treatment can actually improve whole-body insulin sensitivity.

Our partners at Findacure, as well as their collaborators (researchers and doctors studying CHI and the use of sirolimus to treat CHI), can access the CHI library via Mendeley, browse collections of papers and read Elsevier’s full text publications provided for this project by ScienceDirect.

Finding key players: Text mining for information extraction

Identifying highly relevant documents saves time, extraction of information from text increases the efficiency one step further. To identify proteins potentially involved in CHI progression, we required the terms “protein” and “congenital hyperinsulinism” to be in one semantic relationship, just as we did for “sirolimus” and “congenital hyperinsulinism.” But this time we not only wanted to find relevant documents, we also to get a quick summary: a list of genes involved in CHI, and the way they are involved.

For this we used “Search Enhancement Terms” taxonomy in Elsevier Text Mining, that contains a hierarchy of concepts that describe biological relations plus their synonyms (such as “positive regulation” (e.g., activation, increase, up-regulation), “negative regulation” (e.g., block, prevention, inhibition), or “terms for genetic variations” (e.g., mutation, deletion, chromosomal aberration). Figure 1 shows the Elsevier Text Mining output: preview of sentences with relevant terms highlighted.

ETM screen

Figure 1 – Finding proteins and genes related to CHI in Elsevier Text Mining.

Matched terms with supporting bibliographic information can be exported as a table that would contain the normalized protein name, normalized disease name and normalized term for relation between them. The information from the literature is now structured and can be used as a summary of literature, for visualization of published information (see visualization in Pathway Studio), to find drugs that target proteins involved in CHI mechanism in Reaxys.

Using the same approach, we also identified all drugs that were mentioned in the context of CHI, and all diseases and cell processes associated with this condition.

PS screen

Figure 2 –  Visualization of the relations between proteins/genes and CHI (canonical name: Persistent Hyperinsulinemia Hypoglycemia of Infancy) identified by text mining: different relation types shown with different colors, effect (positive or negative) is shown with + or – sign at the end of the arrow).

Coming up next on the Elsevier Pharma R&D Today Blog, building disease models with the data extracted using text mining.

Additional information:

R&D Solutions for Pharma & Life Sciences

We're happy to discuss your needs and show you how Elsevier's Solution can help.

Contact Sales