We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Reproducible bioinformatics workflows: A case study with software containers and interactive notebooks

Formal Metadata

Title
Reproducible bioinformatics workflows: A case study with software containers and interactive notebooks
Title of Series
Number of Parts
23
Author
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Reproducible specification of workflows in bioinformatics is challenging given their complexity. We developed a new statistical method in the field of circadian rhythmicity, which allows to rigorously determine whether measured quantities such as gene expression are not rhythmic. The statistical method itself was implemented in the R package "HarmonicRegression", available on the CRAN repository. However, the bioinformatics workflow is much larger than the statistical test. For instance, to ensure the validity of the statistical method, we simulated data sets of 20,000 gene expressions, with a large range of parameter combinations (e.g. sampling interval, fraction of rhythmicity, number of outliers). We now demonstrate the use of Jupyter notebooks to document and distribute our statistical method and its application to both simulated and experimental data sets. The notebook runs inside a Docker software container. It ensures complete long-term reproducibility of the workflow.