The entire world is dealing with a global pandemic caused by the rapidly spreading coronavirus SARS-CoV-19. In response to this very serious public health threat, scientists are working collaboratively to provide information and tools to combat the virus. At CosmosID, we have been monitoring both the outbreak and the latest research results, especially RNA-sequencing-based data made available as the outbreak progresses. In this blog post we share with you both resources and progress made in the CosmosID metagenomic detection of SARS-CoV-2. The CosmosID team is eager to work with you to analyze and manage this major health emergency.
Emergence of SARS-CoV-2 has caused a pandemic and global health biothreat
SARS-CoV-2 is spread by human-to-human transmission via respiratory droplets or direct contact
Monitoring and controlling infection to prevent spread of SARS-CoV-2 constitute current primary intervention
CosmosID accurately detects SARS-CoV-2 in samples, providing metagenomic analysis
Researchers can upload metagenomic sequence files to CosmosID for SARS-CoV-2 identification and characterization
What is SARS-CoV-2?
SARS-CoV-2 is the terminology for “Severe Acute Respiratory Syndrome Coronavirus 2”, commonly referred to as “Coronavirus”. This pathogenic virus causes coronavirus disease (COVID-19) and belongs to a family of single-stranded RNA viruses. The virus genome spans 29,891 nucleotides. This type of virus can be found in many animal species and and has the ability to cross the animal species barrier and infect humans. When its genomic sequence became available, scientists compared this virus genome with other available coronavirus genomes. It was concluded that a novel virus with the closest known relatives being SARS-CoV, the virus causing the SARS outbreak in 2003, and coronavirus carried by bats.
Automated placement of SARS-CoV-2 to closest known viruses
On January 22, 2020 we downloaded the novel coronavirus genome (now called SARS-CoV-2) and analyzed it using the CosmosID metagenomics analysis platform (https://app.cosmosid.com). We had not yet added this new genome to our database and wanted to see which genomes, if any, our algorithms would find as closest matches. The results we obtained within minutes correspond with the phylogenetic research being done by the genomics community.
As you can see in Figure 1 above, the CosmosID platform detected SARS coronavirus, at the species level, and Bat coronavirus BM48-31/BGR/2008 to strain-level identification. The platform compares NGS reads to sequence signatures (i.e., kmers) in a database arranged as a phylogenetic tree, and contains unique and shared kmers that map to each level in the tree. Within minutes the fully automated analysis identified kmers that pointed to the same bat coronavirus (highlighted in Figure 2 below) that Zhou et al. (2020) had identified as phylogenetically similar to SARS-CoV-2.
This example demonstrates that the unique phylogenetic structure of the CosmosID database even allows a meaningful classification of pathogens that at the point of analysis were still unknown to the world (and therefore the database). Nature published the findings by Zhou et al. (2020) the day after CosmosID concluded our analysis.
Detecting SARS-CoV-2 in metagenomic samples
The SARS-CoV-2 genome is now in the CosmosID database. As the virus continues to spread, it is becoming critical to detect and classify the virus in patient samples so that individuals suspected of carrying the disease can be identified. While many labs around the world are using RT-PCR to detect SARS-CoV-2, a more precise method of detection is sequencing RNA in patient samples. A potential limitation of this method is that only a small percentage of the reads may include the virus of interest, making it potentially more difficult to identify. The upside of metagenomic sequencing on the other hand is the method’s ability to readily detect secondary pathogens that patients infected with SARS-CoV-2 may have acquired.
To assess the performance of CosmosID using metagenomic samples from patients diagnosed with COVID-19, our team analyzed nine bronchoalveolar lavage (BAL) metagenomic samples (deposited in the NCBI Sequence Read Archive under https://www.ncbi.nlm.nih.gov/biosample/SAMN14082199) through the CosmosID cloud application after we had included the SARS-CoV-2 genome to the CosmosID viral database.
Despite the fact that we challenged our metagenomic analysis platform with samples that contained in addition to the coronavirus also the microbial background associated with BAL samples, and despite several cases of shallow sequencing depth (of 5M reads or less), we were able to identify SARS-CoV-2 in all of the samples. In addition, as you’d expect when using metagenomic sequencing, the platform reported other bacteria and viruses found in the respiratory samples as shown in the heat map in Figure 4.
How To Run Your Metagenomic Samples
CosmosID can detect and identify SARS-CoV-2 in samples using metagenomic analysis. Researchers may use the CosmosID platform for this analysis. CosmosID does not provide diagnostic tests, but the CosmosID application is highly suitable for research purposes. Please reach out to us at email@example.com to learn more. Our team is eager to work with the community to achieve better understanding and to help contain the virus. We need to work together so that our lives can return back to normal.
Recommended End to End Workflow
Collect Sample → RNA Extraction ((QIAGEN QIAmp Viral Mini Kit, PN 52904) → RiboZero Gold rRNA depletion protocol to remove human cytoplasmic and mitochondrial rRNA (Illumina, 48 samples, Cat no. 20020598, 96 samples, 20020599) → Sequencing-ready library preparation using the TruSeq Stranded Total RNA Library Prep Gold kit (Illumina, Cat no. 20020599) along with the IDT for Illumina TruSeq RNA UD Indexes (96 indexes, 96 samples) (Illumina, Cat no. 20022371). RNA fragmentation, first- and second-strand cDNA synthesis, adenylation, adapter ligation, and amplification, according to the TruSeq Stranded Total RNA protocol. After amplification, the prepared libraries should be quantified, pooled, and loaded onto your preferred DNA Sequencer. We recommend sequencing reads at least 75bp in length. → Analyze sequence data on app.cosmosid.com
BAL Metagenomic Samples: Wuhan Institute of Virology, Chinese Academy of Sciences, Zheng Shi; 2020-02-11 https://www.ncbi.nlm.nih.gov/biosample/SAMN1408219
Zhou, et al. (2020) Nature https://www.nature.com/articles/s41586-020-2012-7
Gralinksi, et al. (2020) Viruses https://www.mdpi.com/1999-4915/12/2/135/htm
Centers for Disease Control and Prevention https://tools.cdc.gov/medialibrary/index.aspx#/microsite/id/403323
Illumina (2017). TruSeq Stranded Total RNA Reference Guide. Accessed February 13, 2020.
Image Credit: Coronavirus COVID-19 Global Cases by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University: https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6