Pharma R&D Today

Ideas and Insight supporting all stages of Drug Discovery & Development

Select category
Search this blog

Umesh Nandal: Chemist and data scientist in one

Posted on March 10th, 2022 by in AI & Data

Umesh Nandal is a Director of Data Science at Elsevier. As an AI expert with a Master’s in Chemistry, he embodies what Elsevier brings to the table – an actionable fusion of data and domain expertise. “Cross-functional teams combining data science, tech and domain knowledge is the only way we can achieve our collective goal: curing disease,” says Umesh.

“When I was doing my Master’s in chemistry, one of my classmates mentioned how he thought that one day, we’ll be able to automatically extract chemical structures and information,” recalls Umesh. “And how we’ll also be able to see these structures dynamically as they interact with genes and proteins.”

“At the time, I don’t think many students even imagined that as a possibility – and the impact such a possibility could have on curing diseases. But it certainly set me thinking…”

Quality Data is everything

It also set Umesh to action. After completing his Master’s, he switched to bioinformatics and immersed himself deeper into data analytics and AI technologies. By 2005, he was already doing innovative research that applied Machine Learning (ML) in predicting protein behaviors in the spread of malaria. 

“This is where I learned the fundamentals: that data is everything,” Umesh remembers. “Yes, you have to teach the machine all these rules embedded in the data. But first you need to know that the data you use is useful. And when you connect it to other useful data, they must all be prepared and cleaned up in a consistent manner. As they say: ‘Garbage in, garbage out’.”

Best-in-class competitive intelligence and novelty search

Today, Umesh leads a large and diverse data science team who supported the delivery of the acclaimed Patent Expansion project for Reaxys. The pipeline has proven a game-changer in how pharma companies can now speedily track the competitive landscape – and be alerted to any threats to the long-term patentability of a particular discovery project. 

The system is capable of enriching patents with not only information on the millions of target genes and proteins, but also those millions of compound substances that are introduced each year. And thanks to machine-learning models, this information can be evaluated for relevance and easily accessed. 

The best of both worlds

“I’m proud that I actually have both an understanding of the domain, but also the technicalities within the algorithms,” says Umesh. “It helps me understand both the content people and the technical people. In that way I can help everyone to all start talking the same language.”

“And every domain also has its own timelines and sets of problems they face. And because I am familiar with these, I have a better sense of how much time it’s going to take to solve a particular problem. It makes the planning easier in any case,” he says with a smile. 

Developing the patents’ pipeline certainly required a delicate dance between specialists throughout its development and evaluation – not only in terms of the technology but also in ensuring the quality of outputs. After all, the algorithms need to make sense – and keep making sense. 

“Everybody came to appreciate what the others brought to the table. I certainly loved seeing Elsevier’s in-house chemistry experts get inspired by the way their knowledge was being redeployed in a new and highly impactful way,” says Umesh. 

Defining roles: we’re all data scientists now

“But of course, it didn’t happen by itself. We were building a pipeline from scratch – and one that could be continually built on or even re-used for other purposes besides patents, such as journals or perhaps even other use cases in chemistry such as polymers,” Umesh explains. “So it was important to get it right and define clear responsibilities between the data scientists, the ML experts and the content experts as they develop the different modules.” 

“For instance, with our content experts some needed to focus on the quality of the components we were building, but we also have a separate team who is checking the quality after the productionizing when it goes to the database,” he says.

There were certainly moments when people were concerned about the clarity of the data science role within the cross-functional team. “You then had to stress the power of collaboration: that we are all now in the data science game. Whether we are machine learning, content or chemistry experts, we need all these skills if we want to build robust and high-quality prediction models.”

Tuning in to the ultimate goal

While Umesh is happy in his role as “middle person” and sees the advantage of having more cross-functional individuals such as himself, he also sees the power of specialization. “It’s essential for a cross-functional team that everyone understands each other to avoid any misunderstandings. Therefore, content experts should learn fundamental concepts of ML and ML experts should do the same with the chemistry domain. In this way, we can be in better tune with each other. At the same time, we also don’t want to over-train people. Fundamentally, everyone is here for their particular skill sets,” he says.

“And I think this is key: the only reason we were able to productionize this large-scale enterprise-level pipeline and solve all these intensely tricky problems was by having a cross-functional team bringing all these different pieces together. And now we are ready to take on even more complex problems,” Umesh asserts.  

“But another idea also brought the team all together. I think we were all very motivated by the idea of building a tool that saved people time and resources. Researchers can now pivot their time and resources to other problems – towards other potential cures. After all, curing is always the ultimate goal.” 

R&D Solutions for Pharma & Life Sciences

We're happy to discuss your needs and show you how Elsevier's Solution can help.

Contact Sales