Metagenomic Detection of SARS-CoV-2 Coronavirus using CosmosID

13 March 2020by Manoj Dadlani

The entire world is dealing with a global pandemic caused by the rapidly spreading coronavirus SARS-CoV-19. In response to this very serious public health threat, scientists are working collaboratively to provide information and tools to combat the virus. At CosmosID, we have been monitoring both the outbreak and the latest research results, especially RNA-sequencing-based data made available as the outbreak progresses. In this blog post we share with you both resources and progress made in the CosmosID metagenomic detection of SARS-CoV-2. The CosmosID team is eager to work with you to analyze and manage this major health emergency.

Highlights

  • Emergence of SARS-CoV-2 has caused a pandemic and global health biothreat

  • SARS-CoV-2 is spread by human-to-human transmission via respiratory droplets or direct contact

  • Monitoring and controlling infection to prevent spread of SARS-CoV-2 constitute current primary intervention

  • CosmosID accurately detects SARS-CoV-2 in samples, providing metagenomic analysis

  • Researchers can upload metagenomic sequence files to CosmosID for SARS-CoV-2 identification and characterization

What is SARS-CoV-2? 

SARS-CoV-2 is the terminology for “Severe Acute Respiratory Syndrome Coronavirus 2”, commonly referred to as “Coronavirus”. This pathogenic virus causes coronavirus disease (COVID-19) and belongs to a family of single-stranded RNA viruses. The virus genome spans 29,891 nucleotides. This type of virus can be found in many animal species and and has the ability to cross the animal species barrier and infect humans. When its genomic sequence became available, scientists compared this virus genome with other available coronavirus genomes. It was concluded that a novel virus with the closest known relatives being SARS-CoV, the virus causing the SARS outbreak in 2003, and coronavirus carried by bats.

Automated placement of SARS-CoV-2 to closest known viruses

On January 22, 2020 we downloaded the novel coronavirus genome (now called SARS-CoV-2) and analyzed it using the CosmosID metagenomics analysis platform (https://app.cosmosid.com). We had not yet added this new genome to our database and wanted to see which genomes, if any, our algorithms would find as closest matches. The results we obtained within minutes correspond with the phylogenetic research being done by the genomics community.

Figure 1  CosmosID results for closest matches to the SARS-CoV-2 genome before adding it to the CosmosID database (analysis on app.cosmosid.com)

Figure 1 CosmosID results for closest matches to the SARS-CoV-2 genome before adding it to the CosmosID database (analysis on app.cosmosid.com)

As you can see in Figure 1 above, the CosmosID platform detected SARS coronavirus, at the species level, and Bat coronavirus BM48-31/BGR/2008 to strain-level identification. The platform compares NGS reads to sequence signatures (i.e., kmers) in a database arranged as a phylogenetic tree, and contains unique and shared kmers that map to each level in the tree. Within minutes the fully automated analysis identified kmers that pointed to the same bat coronavirus (highlighted in Figure 2 below) that Zhou et al. (2020) had identified as phylogenetically similar to SARS-CoV-2.

Figure 2  Phylogenetic tree from Zhou et al. (2020) showing the placement of five strains of 2019-nCov (original nomenclature for SARS-CoV-2). The arrow and highlighted name highlight the virus that was chosen as the closest strain by CosmosID automated analysis in the cloud.

Figure 2 Phylogenetic tree from Zhou et al. (2020) showing the placement of five strains of 2019-nCov (original nomenclature for SARS-CoV-2). The arrow and highlighted name highlight the virus that was chosen as the closest strain by CosmosID automated analysis in the cloud.

This example demonstrates that the unique phylogenetic structure of the CosmosID database even allows a meaningful classification of pathogens that at the point of analysis were still unknown to the world (and therefore the database). Nature published the findings by Zhou et al. (2020) the day after CosmosID concluded our analysis.

Detecting SARS-CoV-2 in metagenomic samples

The SARS-CoV-2 genome is now in the CosmosID database. As the virus continues to spread, it is becoming critical to detect and classify the virus in patient samples so that individuals suspected of carrying the disease can be identified. While many labs around the world are using RT-PCR to detect SARS-CoV-2, a more precise method of detection is sequencing RNA in patient samples. A potential limitation of this method is that only a small percentage of the reads may include the virus of interest, making it potentially more difficult to identify. The upside of metagenomic sequencing on the other hand is the method’s ability to readily detect secondary pathogens that patients infected with SARS-CoV-2 may have acquired.

To assess the performance of CosmosID using metagenomic samples from patients diagnosed with COVID-19, our team analyzed nine bronchoalveolar lavage (BAL) metagenomic samples (deposited in the NCBI Sequence Read Archive under https://www.ncbi.nlm.nih.gov/biosample/SAMN14082199) through the CosmosID cloud application after we had included the SARS-CoV-2 genome to the CosmosID viral database.

Figure 3  Number of reads per sample from BAL metagenomic samples (analysis on app.cosmosid.com)

Figure 3 Number of reads per sample from BAL metagenomic samples (analysis on app.cosmosid.com)

Despite the fact that we challenged our metagenomic analysis platform with samples that contained in addition to the coronavirus also the microbial background associated with BAL samples, and despite several cases of shallow sequencing depth (of 5M reads or less), we were able to identify SARS-CoV-2 in all of the samples. In addition, as you’d expect when using metagenomic sequencing, the platform reported other bacteria and viruses found in the respiratory samples as shown in the heat map in Figure 4.

Figure 4  Heat map of viruses and phages detected in COVID-19 patient BAL samples, including SARS-CoV-2, with a box around the name (analysis on app.cosmosid.com)

Figure 4 Heat map of viruses and phages detected in COVID-19 patient BAL samples, including SARS-CoV-2, with a box around the name (analysis on app.cosmosid.com)

Figure 5  Krona plot showing bacteria and viruses in a single COVID-19 patient BAL metagenomic sample (analysis from app.cosmosid.com)

Figure 5 Krona plot showing bacteria and viruses in a single COVID-19 patient BAL metagenomic sample (analysis from app.cosmosid.com)

How To Run Your Metagenomic Samples

CosmosID can detect and identify SARS-CoV-2 in samples using metagenomic analysis. Researchers may use the CosmosID platform for this analysis. CosmosID does not provide diagnostic tests, but the CosmosID application is highly suitable for research purposes. Please reach out to us at info@cosmosid.com to learn more. Our team is eager to work with the community to achieve better understanding and to help contain the virus. We need to work together so that our lives can return back to normal.

Recommended End to End Workflow

Collect Sample → RNA Extraction ((QIAGEN QIAmp Viral Mini Kit, PN 52904) → RiboZero Gold rRNA depletion protocol to remove human cytoplasmic and mitochondrial rRNA (Illumina, 48 samples, Cat no. 20020598, 96 samples, 20020599) → Sequencing-ready library preparation using the TruSeq Stranded Total RNA Library Prep Gold kit (Illumina, Cat no. 20020599) along with the IDT for Illumina TruSeq RNA UD Indexes (96 indexes, 96 samples) (Illumina, Cat no. 20022371). RNA fragmentation, first- and second-strand cDNA synthesis, adenylation, adapter ligation, and amplification, according to the TruSeq Stranded Total RNA protocol. After amplification, the prepared libraries should be quantified, pooled, and loaded onto your preferred DNA Sequencer. We recommend sequencing reads at least 75bp in length. → Analyze sequence data on app.cosmosid.com

References

BAL Metagenomic Samples: Wuhan Institute of Virology, Chinese Academy of Sciences, Zheng Shi; 2020-02-11 https://www.ncbi.nlm.nih.gov/biosample/SAMN1408219

Zhou, et al. (2020) Nature https://www.nature.com/articles/s41586-020-2012-7

Gralinksi, et al. (2020) Viruses https://www.mdpi.com/1999-4915/12/2/135/htm

Centers for Disease Control and Prevention https://tools.cdc.gov/medialibrary/index.aspx#/microsite/id/403323

Illumina (2017). TruSeq Stranded Total RNA Reference Guide. Accessed February 13, 2020.

Image Credit: Coronavirus COVID-19 Global Cases by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University: https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6

Manoj Dadlani

Mr. Manoj Dadlani serves as Chief Executive Officer at CosmosID, Inc., the Maryland based provider of industry-leading solutions for unlocking the microbiome. Previously, Mr. Dadlani served as a partner at Applied Value Group, a management consulting and investment firm, and was co-founder and CEO at Rasa Industries, Ltd., a leading beverage manufacturing company. Mr. Dadlani has substantial experience in strategy, M&A, supply chain management, product development, marketing and business development. Mr. Dadlani received his bachelor’s and master’s degrees in Biological Engineering from Cornell University. Services offered by CosmosID’s CLIA certified and GLP laboratory cover the entire workflow from study design to sample collection, extraction, library preparation, sequencing, data analysis and publication support. CosmosID’s cloud-based metagenomics application offers user-friendly access to the largest curated databases for microbial genomics, antimicrobial resistance and virulence data and has been independently validated to return metagenomic analyses at strain level resolution with industry-leading sensitivity and precision.