Pharma R&D Today
Ideas and Insight supporting all stages of Drug Discovery & Development
SciBite’s James Malone Talks Future of Biocuration on Conference Panel
Posted on April 15th, 2021 by Lauren Barham in AI & Data
Biocurators are among the unsung heroes of the life sciences world, performing the essential task of translating and integrating biomedical information into interoperable databases. Of course, there is a lot more to it than that, as attendees of the International Society for Biocuration’s 14th Annual Biocuration Conference already know.
At the first session of this year’s conference, held virtually on April 13 and chaired by Genentech’s Rama Balakrishnan, panelists had a lively discussion on “The Future of Biocuration,” but started by sharing some of their feelings about what the job itself entails. “When I think about it, it’s applying semantic standards to ensure data findability and aggregation,” explained Carol Bult of the Jackson Laboratory in Bar Harbor, Maine. Kambiz Karimi of Myriad Women’s Health in San Francisco talked about how it involves structuring content and using a controlled vocabulary, also noting that “cleaning up” the content is a significant part of the effort.
Maintaining data quality
The clean-up aspect of data curation connects to the issue of data quality. Balakrishnan posed the question: What does good quality data mean, and what are the key metrics to ensure that quality? For James Malone, CTO of the semantic AI company SciBite (acquired by Elsevier last year, bringing together the ability to deliver applied AI in a scalable, structured and repeatable way), the answer was “Testing, testing, testing.” He explained that SciBite uses comprehensive, gold-standard tests, and also pointed out that the end product should be “in the right form so you can consume it and use it”.
Bult also clarified that “Data quality and annotation accuracy are two different things” and are approached with different processes. Some of the panelists described methods of ensuring accuracy that are very hands-on, including directly contacting authors, publishers and laboratories to fact check information.
On the related matter of quality control, Karimi explained that his company, which specializes in genetics testing and personalized medicine, has a peer review process and has 30 curators on their team to ensure rigorous checking and double-checking for errors and omissions, plus some automated processes.
AI as a helper, not a replacement
When thinking of the future of biocuration, AI and machine learning loom large. “It will enhance our work by making some bottom-level decisions for us,” suggested Sandra Orchard of the European Bioinformatics Institute (EBI), who doesn’t envision machine learning replacing manual curation. Although she certainly can imagine it becoming increasingly important as ML becomes more powerful, she thinks papers are going to continue requiring human interpretation to understand what the human who wrote it meant.
Malone said that at SciBite they are putting a lot of effort into developing techniques and building models, and that those models can work quite well—but there is still a lot of work to do to really start exploiting deep learning advances. He does see a future in training AI models to help with curation, but predicts that, “It will become an assistant; it will not replace subject matter experts.”
Shaping the perception of biocuration
Carol Bult flagged one particular danger of AI and ML, which is a broad misunderstanding about what they can actually do. “Trying to get funding for biocuration is challenging because of the perception that ML can do most of it. We’re working on the technology, but it’s not going to replace biocurators.” She feels that biocurators need to tackle this mis-perception and articulate a framework of how AI and biocuration go hand in hand.
Importantly, James Malone highlighted the link between data science and curation, noting that much of a data scientist’s work is data wrangling and cleaning data, and so a big chunk of what they do is, essentially, curation. Bult argued that biocurators need to make sure people realize how important their discipline is to data science, because the value of data science is already recognized across industries.
“If we frame biocuration in the context of data science, I think that will help,” she said. “We have to get better at explaining what the ROI is. What can you do—because data are quality controlled and curated—that you wouldn’t be able to do if it wasn’t curated? We have to do a better job of explaining and telling the stories.”
Malone believes the prevalence of AI will actually shine a light on the value of curation. The more commonplace that AI becomes, and the more it and approaches for data lakes and knowledge graphs and so on are at the forefront of decision maker’s mind, the more they will appreciate the value of well-labelled data. After all, he says, your models are only as good as your data, and the same is true for data lakes and knowledge graphs—proving that biocuration has never been more relevant.
R&D Solutions for Pharma & Life SciencesWe're happy to discuss your needs and show you how Elsevier's Solution can help.
SciBite Marketing Lead
- Reaxys User Day 2022: Sharing insights and tips for streamlining the chemist workflow
- Reaxys User Day 2022: Researcher secrets in the spotlight
- Elsevier and LG: Turning data into action
- Pharma can now track the most relevant patent info – fast and at scale
- Umesh Nandal: Chemist and data scientist in one