Pharma R&D Today
Ideas and Insight supporting all stages of Drug Discovery & Development
A Novelty Metric for Evaluating Journal Articles and Authors
Posted on July 7th, 2020 by Matthew Clark in Pharma R&D
Scientists have long looked for ways to measure the impact and value of their research. In this article we propose a new metric that attempts to measure the recency of the facts discussed in an article.
Many of the current bibliometrics stem from work of Eugene Garfield who was among the first to track citations, and use the counts of citations in various algorithms to measure the impact of both papers and scientists. Since his work many metrics have been developed based on citations. These include these popular metrics (this is not an exhaustive list):
All of these are based on how often other authors found a work worth citing in subsequent articles. These are all good metrics.
Here I propose a measure of novelty of an article – the number of new facts reported reported in the article. Elsevier’s RESNET has extracted out millions of biological relationships from article abstracts and full-text from literature from . In this study these are the “facts” used to evaluate the novelty of a paper.
Here is an example biological relationship, “citrate upregulates cell differentiation”, which is mentioned in 17 articles.
The types of relationships extracted in RESNET include regulation, binding, adverse events, pathway-disease links, ligand-protein binding and many others. Each is annotated with all of the articles stating the fact.
Journal articles generally discuss many known relationships in addition to proposing novel ones. The weighting algorithm we are experimenting with down-weights relationships by the number of previous papers that state the relation to focus on papers that state novel relationships. Therefore the “novelty score” of a paper can be computed with the following formula, based on all the relationships extracted from the paper:
A novel, never before reported relationship contributes 1.0, while a relationship stated in 100 previous papers (previous: same or previous year as the paper being ranked) adds only 0.01 to the score. Therefore papers stating newer, less reported, relationships will have higher scores. Relationships that appear in the introduction of an article explaining the background of the research example may not add significantly to the novelty score. Unlike citation scores, this metric is static. Since it only counts the current and previous years for the article being scored the metric is not changed by future articles.
We tested the concept with about 10,000 biology articles – all from 2015 so that they all have had equal opportunity to be cited. This was to allow comparison of the novelty score with citation counts.
The graph below compares the novelty score against citation count since 2015 for each paper in the corpus. The graph shows that the novelty score is not related to citations and therefore is providing different information. It is not surprising that that the score is roughly related to the total number of relationships extracted from a paper.
The top papers of 2015, as ranked by the novelty score, are shown below.
The highest scoring paper for 2015 was “Proteomic analyses reveal distinct chromatin‐associated and soluble transcription factor complexes” This work identified a large number of novel protein-protein interactions. The interactions first reported in the paper, as well as some reported in only one other paper are shown below.
One can then create author scores by summing or other functions based on the novelty of the papers that they have published or co-authored. This table shows the scores for the authors with the highest sum of novel facts published. However, functions other than simple sum may better represent an author metric.
One can see that some types of papers report more relationships, e.g. those that have tables with target binding of compounds tend to have large numbers of novel facts. So the scores may be most comparable among papers and authors when comparing the same type of research in biology or medicinal chemistry.
This bibliometrics experiment is based on biological relationships extracted for the RESNET dataset in PathwayStudio, so it focuses on biology and related life-science topics. One could imagine similar analysis for other fields such as chemistry or physics as a way to more quickly identify work reporting novel facts.
The Elsevier Services team can apply this, and many other, types of innovative analytics for you to help identify novel discoveries and researchers who are reporting the most innovative science in fields of your interest. We are striving to find new metrics and analytics to address your needs.
R&D Solutions for Pharma & Life SciencesWe're happy to discuss your needs and show you how Elsevier's Solution can help.
Life Sciences R&D Solution Consultant