Pharma R&D Today
Ideas and Insight supporting all stages of Drug Discovery & Development
AI-driven innovation in life sciences: Unlocking the page – 20+ years of digital innovation at Elsevier
Posted on December 2nd, 2021 by Ann-Marie Roche in AI & Data
Mark Sheehan is VP Data Science on Elsevier’s Life Science team. His 22 years at the company maps closely with Elsevier’s digital journey over that time. And today, pharmaceutical companies can follow a similar journey – albeit highly accelerated – using Elsevier’s latest AI-driven R&D boosters. Mark has enjoyed many valuable experiences along the way, from the joys of cracking open a newly printed book, to enabling people to speedily crack new synthetic pathways at scale. “Yes, innovation always involves new technology. But it’s equally about human collaboration,” he says.
When Mark joined Elsevier in 1999 as a project manager, the company was in the first phase of transitioning away from being primarily a print publisher. With each technological transition that followed – from the move from print to online, through to content enrichment, and now to predictive modeling – Mark was closely involved with Elsevier’s transformation into becoming innovators in information analytics. Today, he leads the data science team for the Life Sciences, exploring how AI and other technologies can streamline the research and development work of chemists and biologists.
In many ways, Mark’s career is not only a mirror of Elsevier’s evolution over the same time, but also of the current evolution of many companies and organizations who are embracing digital transformation to achieve their short- and long-term goals. We asked Mark to look back at some of his pivotal professional moments that may resonate for those on this journey.
FIRST WAVE: THE DIGITIZATION OF INFORMATION
So, you were there at the internet’s big bang …
Mark Sheehan: In some ways I did get the full spectrum. When I started in publishing, immediately prior to Elsevier, I was working in a very small journals publisher where they copyedited, typeset and printed the journals themselves all in the same small building in North London. And when I arrived at Elsevier it was a classic case of right place, right time. Luckily, I proved to be a terrible copy editor and proofreader – I just didn’t have the patience for it. But I did have a natural affinity for computers and some rudimentary programming skills, backed by a willingness to learn.
My first boss in Elsevier was a great guy and very supportive of my growth. But we came from very different worlds. He would love it when the first copies of a book came in from the printer – he’d pick it up, sniff the fresh smell of ink on paper, and vigorously shake it upside down to make sure the binding was good. Meanwhile, I was this enthusiastic puppy evangelizing about the internet and jumping up and down to take on each and every task relating to the shift online. He largely left me and the other geeks to it, and put a lot of trust in us to lay the foundations for the future.
By then ScienceDirect was already up and running – a sign that the, um, shelf life of those printed books was becoming limited.
ScienceDirect was actually an amazing strategy for the time. As a subscription service offering all Elsevier’s content in one place at one price, there are some comparisons with what Spotify would do with music years later. So, it was a bold shift for the whole company, first in journals then in books.
Doomsayers were predicting the death of the book.
Indeed. With the rise of desktop publishing, the whole notion of a book as a container of information was fundamentally shifting. We began thinking about what unit of information did the customers ultimately want. How could we best organize these articles and journals for them? Did they want the full book or just a chapter? Again, it’s similar to music when customers shifted from CD to purchasing individual tracks on iTunes. Forgive me, I do use a lot of music analogies, but both these industries were fundamentally disrupted in parallel.
But coming from a place as a legacy print publisher, Elsevier must have had some grumpy employees unwilling to embrace the shift to digitization. And Elsevier is a rather huge company.
Yes, it’s a common perception that larger companies are slow to change. And there were challenges, let’s be clear on that. Some people really cared about the craft of print, which is great – and print does remain important in many markets. And yes, some colleagues got upset sometimes that we had to standardize our print designs so they could also work online, which I can also sympathize with. And indeed, to push through change, you have to consider the human implications as much as the technical ones. But most were quick to see the bigger story was not about paper but about the transmission of information – and being able to unlock what was written on the page for the largest scientific audience possible.
SECOND WAVE: LEVERAGING THE DATA
So, the average Elsevier employee proved to be less obsessed with books than with information?
Once they saw that the internet wasn’t a threat and wasn’t taking anything away, but rather just changing how we disseminated the information, the shift was clear for all of us. And as individual consumers, we were all evolving with the times: getting iPods, Kindles, laptops, et cetera. Everyone was aware of where the world was going, because we were part of that world in the middle of this amazing shift in society. And yes, some worried about whether their job would become obsolete. But they soon recognized that their skills were still valuable and/or adaptable in tandem with these changes.
But the true tipping point, the payback, only came later – when e-revenue eclipsed print revenue.
Yes, every year the digital revenue grew and grew, and suddenly there was this tipping point when we stopped focusing exclusively on when the book hit the warehouse for sale, but also when it appeared on ScienceDirect or was available on the Amazon store. Basically, we were leveraging what we already had spent years building: a solid digitized foundation. So, this second wave was much less bumpy; it was more about providing more and more different types and variations of deliverables for different consumers.
And as this reach continued to extend, the way people consumed our content fundamentally changed as well – it was no longer exclusively about browsing through the library and reading many physical copies to find what you needed. It was becoming more about how to optimize your search across many online sources and databases. So, the challenge became more about streamlining the finding of the digital needle in the exponentially-expanding haystack.
THIRD WAVE: FROM MANUAL TO AUTOMATED
So, the next step was to leverage the digitalized text even further: applying data science and AI to enrich, and extract from, this information in whole new ways to help with digital search and discovery.
My department first came together in Elsevier five or six years ago to look into what can we do to move the needle on data science in the Life Sciences. We discovered early on we could do a lot of things to automate our traditional manual curating processes, which previously involved very smart people reading all these articles, literally page by page, and doing all sorts of clever annotations based on following these dense indexing “rule books”. But we also discovered that no matter how much technology you use, there’s a human limit to how much you can scale this approach.
So, to move forward, we began to wonder what would happen if we could teach the machine to read it all, do some initial enrichment that would update our customers quickly on new research, and also flag any interesting material to be read and indexed in detail by an expert. Luckily Elsevier already had some in-house tech pioneers who had already started building some automated tools and paths that we could quickly extend for processing at scale – particularly for our chemistry database Reaxys and our biomedical literature database Embase.
Can you tell us more on how the humans-meet-machine-learning axis was applied in Reaxys?
Basically, we realized two things. First, that a single “silver bullet” technology doesn’t exist for our use cases. It certainly won’t give you the range of what a human being can do, nor what your customers are asking for. But if you stack different and complementary technologies together, they can work to cover different elements and you can get a much better view – “more sides of the elephant”, as it were.
Second … When this department first started, there was a team of about 20 PhD-level chemists and biologists who, in some cases, had been carefully curating these enrichment flows for 30-plus years. We then brought these ‘manual’ domain experts together with our data scientists and analysts to work together to ‘train the machine’. This led to amazing results – as seen with our current patent coverage, for example.
In just that first year of our team coming together, we were able to deliver new automated capabilities for Reaxys that could enrich articles from 16,000 journal titles per year, versus the 400-odd we processed previously.
What do you see as the next huge leap forward?
Well, if I can backtrack a moment … We’ve been talking about the digital journey from content (such as the books and journals on ScienceDirect) to data (such as the facts and concepts indexed from those books and journals for easier search and discovery). But the power of machine learning can also be used for predictions – to, in effect, teach the machine chemistry by feeding it massive volumes of complex chemical reactions and facts.
For example, machine learning models can be used to not only correctly identify well-established paths to create a certain compound as well as a trained chemist, but it can also suggest previously unknown paths to synthesize that compound – paths that can be cheaper, faster, and more environmentally friendly.
Already, Entellect’s reactions workbench can be used to create such models from our Reaxys data and other high-quality data sources. And the Reaxys Predictive Retrosynthesis tool helps even very experienced chemists by suggesting new synthetic paths using a range of best-in-class proven predictive models. Meanwhile, we are continuing to work with a number of leaders in the field of predictive retrosynthesis, such as eminent researchers like Professor Mark Waller who published a very famous paper for Nature, ‘Planning chemical syntheses with deep neural networks and symbolic AI’, which provided some of the foundational work for Reaxys.
And this process of predictive modeling can continuously improve as we add more enriched data, and as our human experts validate the outputs of the machine learning models. There’s still so much more opportunity in this space, and research continues to move forward all the time.
What projects excite you most in terms of furthering this ‘continuous improvement’?
Well, without giving away too many company secrets, we have made some fantastic advances in recent years to mine the full text of chemistry-related journal articles. And since Elsevier acquired SciBite last year, we are working to add their powerful semantic technologies into our “data science toolkit” for further advances, particularly in the biomedical space. But in general, we will continue to expand our automation capabilities, while also moving deeper into the predictive chemistry space that I mentioned earlier.
We also have a very productive research collaboration with Professor Karin Verspoor and her doctoral team in Australia via our ChEMU (Cheminformatics Elsevier Melbourne University) collaboration, related to automating ways to extract information about chemical reactions in chemical patents. And as a result, we are making real headway into the many and varied challenges of training a machine to read tables and accurately extract information from them – which is very valuable for chemists and other researchers, as you can well imagine.
We are also doing a lot of work with our research partners led by Dr. Gordon Broderick at Rochester Institute of Technology looking at ‘in silico biology’ where you try to do as much early experimentation in the computer prior to live testing – which could dramatically cut the time it takes for clinical trials, as well as risks. So, we are really moving forward with all sorts of interesting directions at the moment.
So, moving forward means partnering?
Absolutely. With all these different directions and partnerships taking place in this predictive space, we are part of a large research community. And certainly, the Life Sciences requires more collective engagement than in many other sectors – in terms of involving academia, the business world, policymakers and regulatory bodies.
And we are very much a part of this community – whether it’s partnering with more academic institutions, supporting researchers at all stages of their career, expanding our interns program, or inspiring the younger generations with such initiatives as Amsterdam Data Science. Only together can we really build a healthier future.
R&D Solutions for Pharma & Life SciencesWe're happy to discuss your needs and show you how Elsevier's Solution can help.
Director, Corporate Markets Marketing, Elsevier
- Preprints offer early insights into research
- How pharma can make drug information accessible
- Reaxys User Day 2022: Researcher secrets in the spotlight
- Taking on the Net Zero challenge at the government level
- Elsevier and LG: Turning data into action