The Carbon Footprint of Bioinformatics

Advancements in biological and biomedical research are fueled by the analysis of vast, complex, integrated datasets, made possible through large-scale computational resources.


While bioinformatics has revolutionized our understanding of diseases such as cancer or COVID-19, there are costs associated with the computing requirements. On one hand, computational power requires financial process, but it may also impact the environment and, so human health. 


In this article, we’ll explore the background and research into the carbon footprint of bioinformatics. We’ll also touch on our own carbon footprint, assessing the CosmosID-HUB Microbiome Platform in order to give you an idea of what kind of impact a bioinformatics project can have on the environment.


The energy consumption of computers contributes to greenhouse gas emissions, which have detrimental effects on both human health and the environment.The annual electricity usage of data centers and high-performance computing facilities already surpasses the consumption of developed countries like Ireland or Denmark. 


Over the last decade, the generation of biological sequence data has become exponentially cheaper. This has led to research efforts to output orders of magnitude more biological data. On one hand, more data leads to biological discovery, improving human health. 


On the other hand, this growing demand generating more and more biological data is expected to continue rising. The exponential growth of biological databases has intensified the need for computational resources to analyze these vast datasets. 


This would result in a surge of greenhouse gas emissions, a well-known, significant contributor to outdoor air pollution and climate change. Ambient air pollution alone is estimated to cause 4.2 million deaths annually, with 91% of the global population exposed to air quality below World Health Organization standards. 


Global warming leads to further adverse effects, including increased population exposure to wildfires, economic losses, and the vulnerability of older populations to heat waves.


Given the urgent global emergency of climate change, it becomes crucial to evaluate the carbon footprint of computational analyses and the associated tools to minimize environmental impacts. 


While fields like machine learning and astrophysics have begun studying the environmental impact of their computational work, computational biology is yet to address this issue comprehensively. 


Furthermore, other aspects of biological research, such as equipment power consumption and travel, also contribute significantly to greenhouse gas emissions. 


Consequently, the study of Grealey et al (2022) aims to estimate the carbon footprint of common bioinformatic tools by considering the energy consumption of different hardware components and associated emissions from electricity production. 


The study examined various bioinformatic approaches including genome-wide association studies (GWAS), RNA sequencing, genome assembly, metagenomics, phylogenetics, and molecular simulations. 


Additionally, the study also analyzed different computation strategies such as parallelization, CPU versus GPU usage, cloud versus local computing infrastructure, and geographical factors which are widely and interchangeably used in the field of bioinformatics. 


From this point of view, the finding of this study suggests environmentally conscientious bioinformatics guidelines for scientists. According to the study, biobank-scale GWAS analyses emitted significant amounts of carbon dioxide equivalent. 


Nonetheless, simple software updates, like transitioning from BOLT-LMM v1 to v2.3, may result in a remarkable 73% reduction in carbon footprint. Moreover, switching to more efficient data centers could have decreased the carbon footprint of bioinformatics by approximately 34%. The study also identified that memory over-allocation significantly contributes to greenhouse gas emissions related to bioinformatics. 


Faster processors and greater parallelization reduce running time and operational costs for biological data generation, but they may lead to a higher carbon footprint. To facilitate understanding among scientists, the results are presented in relatable metrics, such as distances traveled by cars or amounts of carbon sequestered by trees. 


For example, while genome scaffolding emitted as much carbon gas as travelling 0.17 km with a car, metagenome assembly accounted for carbon emissions as high as travelling 1065 km with a car. 


Comparing the carbon emissions with these everyday examples raises awareness about the environmental impact of bioinformatics and provides actionable metrics for adopting greener practices. Classifying DNA sequencing reads is a key bioinformatics process for microbiome profiling with significant amounts of carbon emissions according to the paper. 


The paper illustrates two orders of magnitude difference in carbon emissions per Gb of DNA sequence classified between the long-read classifier MetaMaps which emits  3.65 kgCO2  and short-read classifiers Kraken2 and Centrifuge which had emissions ranging between 0.001 to 0.018 kgCO2. 

Unlock the Power of the Microbiome with CosmosID

At CosmosID, our environmental consciousness led us to measure the carbon emissions of our algorithm per Gb of short-read DNA classified. 


Using Green Algorithms, we calculated that CosmosID’s short-read classification algorithm emitted 308 mg carbon dioxide per Gb of DNA classified when ran on in-house servers and emitted 225 mg when ran over AWS (0.000308kg and 0.000225kg, respectively). 


Altogether, our calculation suggests that CosmosID short-read classification has orders of magnitude less carbon dioxide emissions than publicly available tools tested in the study of Grealey et al (2022).


To sum up, this study serves as a call to action for the bioinformatics community to address its environmental footprint. By understanding the carbon emissions associated with computational tools and analyses, researchers can make informed decisions to minimize their impact. 


This includes optimizing hardware usage, adopting energy-efficient computing practices, and exploring renewable energy sources for powering computational infrastructure. Additionally, reducing the carbon footprint of other aspects of biological research, such as laboratory equipment and travel, should also be prioritized. 


Recognizing the environmental impact of bioinformatics and implementing sustainable practices will not only benefit human health and the environment but also ensure a more sustainable future for scientific research as a whole.

Ready to get started with the CosmosID-Hub? Contact us today.


Want more like this? Sign-up to our newsletter to get the latest news from CosmosID:

Barış Özdinç

Barış Özdinç analyzes microbiome research with his educational background in genetics and evolution. As a research analyst for CosmosID, he combines metagenomics and data analyses to identify microbial biomarkers in disease cohorts and evaluate microbiome research tools. His work involves curating microbiome data and creating interesting microbiome content for newsletters and blog posts. Barış Özdinç received his bachelor’s degree in genetics and master’s degree in biodiversity, evolution, and conservation from University College London (UCL). Currently, he lives in Istanbul, Turkey, where he lives with his cat, Delight, and mentors female students in their STEM career pursuits.