Expert Consult

By Ahmad Zaheen, MD MSc

Published July 19, 2022


The global effort to share data and resources about the ongoing Covid-19 pandemic has led to a flood of information as infections peak at different times around the world. Early in the pandemic, the challenge was to determine how to rapidly and reliably share data. Scientific journals have had steep increases in submissions and face tremendous pressure to turnaround manuscripts and disseminate information quickly. The response has resulted in rapid publication of thousands of articles on virtually every facet of Covid-19. In addition, preprint servers allow authors to share their data even faster by bypassing formal peer review.

In the current Covid-19 pandemic, what seems like clear answers to simple questions on virus characteristics, transmissibility statistics, and risk factors on one day often change by the next. Finding answers to specific questions requires a dedicated literature review spanning the realm of preprint and peer-reviewed servers; an impossible task given that the data are updated continuously.

The White House Office of Science and Technology Policy recognized this very issue and issued a call to action for the technology community. In response, Kaggle, a subsidiary of Google and a leader in artificial intelligence (AI) created a platform that uses machine learning to comb the scientific literature and answer key COVID-19 questions based on the research priorities defined by a Committee of the United States National Academies Science, Engineering and Medicine and the World Health Organization.

The platform is automated, broad-based, and updated at least weekly with human curation to ensure the answers are sensible. Researchers can find and post data sets, and the community of statisticians and data scientists produce models to summarize the data. The result is that users can acquire an overview of data pertinent to a given question and use the data in generated tables to help make clinical judgements or guide more directed reading. For example, if we wish to know the incubation period for SARS-CoV-2, the data table for this question lists the bulk of relevant studies published to date and includes source links, associated mean or median incubation times and ranges, sample sizes, and study types.

This curated literature extraction and compendium does not deliver conclusions. Rather, Kaggle’s platform allows users to get a bird’s eye view of the literature and data on a topic with an option to dig deeper if desired. AI can expedite the grunt work of searching and summarizing research studies; however, we need to use our natural intelligence to interpret and apply the data in order to make decisions that will impact people’s lives forever.

Ahmad is a 2019-2020 editorial fellow at the New England Journal of Medicine. He is from Toronto, Canada where he is completing his training in pulmonary medicine at the University of Toronto.