Preparing a Pandemic Response: The Open Virome

Project phases

Published: April 1, 2021

Last Updated: 4 years ago.

View All Projects

“The Open Virome” is preparing the global medical and research community with the tools and knowledge to better anticipate and respond to the next pandemic.

The total number of virus species on Earth is estimated to be over 100 million.  Despite dramatic medical advances, the 1918 Spanish Flu, AIDS, SARS, Ebola and now COVID-19 have significantly impacted human society. Discovery of the Earth’s viruses is a necessary step for predicting where future pandemics may originate, and for developing the diagnostic tools to recognize animal-human virus spillover events sooner when mitigating outbreaks is possible.

Tens of millions of gigabytes of DNA and RNA sequencing data is publicly available in the Sequence Read Archive (SRA), covering vastly diverse ecosystems from all corners of the planet (Figure 1). This data contains potentially hundreds of thousands of new viruses, but was an almost impossibly large data-set to analyze systematically.

Planetary heatmap on the origins of public sequencing data in the Sequence Read Archive.
Figure 1: Planetary heatmap on the origins of public sequencing data in the Sequence Read Archive.

To specifically address the ongoing COVID-19 pandemic, we started an international Open Science collaboration between scientists, led by researchers from the University of British Columbia (UBC), to develop Serratus (https://cic.ubc.ca/projects/serratus-ultra-high-throughput-discovery-of-new-coronaviruses-phase-2). Serratus is an AWS-backed cloud computing architecture, capable of analyzing sequencing data at the planetary-scale. This is achieved by deploying an ultra-optimized bioinformatics pipeline on AWS EC2 to access 22,250 vCPU, performing alignment faster than ever before.

With Serratus, we re-analyzed 5.7 million sequencing samples (~20 million gigabytes) to identify 130,000+ novel species of RNA viruses. This increases the total number of known RNA viruses by over an order of magnitude, and it was completed in 11 days.

Bar graph of The Open Virome Dataset. Overview of the type of RNA viruses uncovered by Serratus
Figure 2: The Open Virome Dataset. Overview of the type of RNA viruses uncovered by Serratus

We are calling this dataset, “The Open Virome” to emphasize that it is freely available (public domain) to all researchers and meant to be used to transform the field of computational virology. These new sequences will vastly improve the current virus statistical models which underlie modern computational virology software, and to be easily navigable with a graphic user interface (www.serratus.io) by the research community to catalyze the rate of discovery and knowledge-translation.

COVID-19 may have caught us off-guard, and we now appreciate the economic and social cost of a zoonotic pandemic. We know this can happen again, and the “The Open Virome” is preparing the global medical and research community with the tools and knowledge to better anticipate and respond to the next pandemic.

Join the Collaboration

Serratus and The Open Virome are fully collaborative Open-Science projects. If you are a scientist or software developer interested in helping out, or if you need help using Open Virome data, please reach out here

We are currently seeking individuals experienced with:
• Computational and/or traditional virology
• phylogenetics
• viral ecology and zoonosis modeling
• web-interface development
• R package development
• AWS cloud computing

Photo by Luka Senica on Unsplash

About the University of British Columbia Cloud Innovation Centre (UBC CIC)

The UBC CIC is a public-private collaboration between UBC and Amazon Web Services (AWS). A CIC identifies digital transformation challenges, the problems or opportunities that matter to the community, and provides subject matter expertise and CIC leadership.

Using Amazon’s innovation methodology, dedicated UBC and AWS CIC staff work with students, staff and faculty, as well as community, government or not-for-profit organizations to define challenges, to engage with subject matter experts, to identify a solution, and to build a Proof of Concept (PoC). Through co-op and work-integrated learning, students also have an opportunity to learn new skills which they will later be able to apply in the workforce.