The CosmosID pipeline was developed to address many challenges in modern-day bioinformatics by achieving better taxonomic resolution than traditional kmer and assembly-based approaches, yet maintaining computational efficiency and being able to achieve species, sub-species and strain-level resolution with less sequencing depth.
The CosmosID pipeline is comprised of two main elements:
The platform utilises Genbook; CosmosID’s curated microbial genomics database. Containing nearly 170,000 phylogenetically organized genomes and gene sequences, the database enables multi-kingdom identification of bacteria, viruses, phages, fungi and protists in addition to antimicrobial resistance (AMR) and virulence factors (VF) in microbiome samples.
With industry-leading detection specificity, sensitivity and precision (see image), CosmosID’s algorithms achieve strain-level resolution through the identification of taxonomically informative and phylogenetically stable markers (Kmers) in your sample which map to the CosmosID database at different levels of the phylogenetic tree.
Generally, kmer-based approaches boast advantages such as being less computationally intensive and more cost-effective than assembly-based approaches, however most pipelines suffer in performance due to inefficiencies in algorithm development as well as database incompletion and lack of structure.
Through the identification of not only unique, but also shared kmers across the phylogenetic structure of Genbook, CosmosID takes metagenomics into the 21st Century through near-neighbour placement, phylogenetic inference and machine learning tools to yield the world’s leading metagenomics pipeline as demonstrated in numerous challenges:
Software developers at CosmosID have automated this award-winning pipeline and made it available through a user-friendly and interactive web-based application which means that regardless of how much computational infrastructure you have available to you, you can interpret your data straight away within the hub to save time!
The CosmosID-Hub features:
The CosmosID taxonomic and AMR/VF reference database constitutes both publicly available genomes or gene sequences through NCBI- RefSeq/WGS/SRA/nr, PATRIC, M5NR, IMG, ENA, DDBJ, CARD, ResFinder, ARDB, ARG- ANNOT, mvirdb, VFDB etc., as well as a subset of genomes sequenced by CosmosID and it’s private collaborators. Additional data is curated for unctional characterization (UniProt, MetaCyc, GO terms).
The CosmosID algorithm utilises a k-mer based approach, and so genome assembly is not required! As the algorithm is searching for short taxonomic markers across the phylogenetic tree that are both unique and shared for each organism, we can tease apart microbes to the species and strain level without requiring any genome assembly, or host DNA depletion. This is one of the many benefits of utilising k-mer based bioinformatics for metagenomic studies.