Colorful DNA sequencing graphic

Bioinformatics Program

The ability to sequence genomes and consider how the unique nature of an individual's genetics may influence their health has transformed modern medicine. With this technology has come a computational need to analyze and interpret these incredibly large datasets, as each person's genome contains more than 6 billion DNA bases, represented by the letters A, G, T and C.

Multiply this huge volume of genomic data by that produced in the growing fields of other omics — including transcriptomics, metabolomics, proteomics and more — and the opportunities and challenges are immense. Finding disease-related genetic and molecular variants within these data is like searching for the proverbial needle in a haystack.

The Bioinformatics Program consists of experts in computation biology and data science who collaborate with clinical investigators to execute analyses that process and interpret omic data. These analyses support and advance translational research within the Center for Individualized Medicine, leading to individualized tests and treatments for patients. When needed, the program develops novel bioinformatics methods to further data processing, integrate multiple data types and improve data interpretation.

The program builds on the already extensive bioinformatics resources and activities in the Department of Quantitative Research Science at Mayo Clinic, as well as on Mayo Clinic collaborations with the University of Minnesota, Arizona State University, the University of Illinois at Urbana-Champaign and the National Center for Supercomputing Applications.

Areas of focus

Data pre-processing

Data pre-processing is a critical step in converting raw omic data into biologically interpretable information. As the technology evolves and knowledge increases within the field, the Center for Individualized Medicine's Bioinformatics Program remains committed to maintaining the highest standards in data analytics.

The program has designed and implemented a suite of pre-processing workflows, including several open-source or in-house applications that have been calibrated for optimal variant calling. Workflows are also engineered to run on highly parallel systems and enabled for cloud computing.

Workflows are available to pre-process:

  • DNA-seq data for the characterization of:
    • Variants (single-nucleotide variants, indels)
    • Structural variants (copy number variation, translocation, inversion)
  • MRNA-seq data for the characterization of:
    • Gene expression levels
    • Allele-specific expression
    • Splice variants
    • Fusion genes
    • Outlier analyses
  • Methyl-seq data from the RRBS protocol and whole-genome sequencing
  • Chromatin immunoprecipitation sequencing (ChIP-seq) and ChIP-exo data for the identification of transcription factor binding sites and histone modification
  • ATAC-seq
  • Long-read assembly and analysis (PacBio, 10X Genomics, Moleculo)
  • Multiomics integrative analyses
  • Pathway, network and predicted functional enrichment analyses
  • Analytical support for samples from humans, animal models and cell lines
  • MiRNA-seq) data for the quantification of microRNAs
  • Linc-RNA data for identification and quantification of long noncoding RNAs
  • Single-cell technology workflows
  • Microbiome and metagenomics data
  • Proteomics and proteogenomics data
  • Metabolomics data

Data analysis and integration

A critical step to the extraction of disease-relevant information from raw data is the annotation of omic data and integration with clinical correlate data. This often includes combining multiple omic data types to enable a more comprehensive understanding of the disease or condition being investigated.

To support this activity, the Center for Individualized Medicine has created the Omics Data Platform, an enterprisewide repository of all research and clinical sequencing data generated from Mayo Clinic patients. Integrated applications and user or scripting interfaces enable Mayo Clinic investigators to query and analyze this comprehensive data resource.

Connections to the Enterprise Data Trust — a Mayo Clinic data warehouse that stores patients' clinical histories — and other clinical data repositories enable bioinformaticians to model and study clinical and omics information about Mayo Clinic patients, as well as biological knowledge from both Mayo Clinic and public sources, in the context of a larger biological system.

Data interpretation

The program's experts analyze omic data and work with investigators and clinicians to interpret the results.

They use association methods to identify variants that are significantly associated with the disease phenotype or directly involved in the biological mechanisms underlying complex diseases. Targeted methods evaluate single-gene changes that may be complicit in disease and enable individualized patient diagnoses. In the data interpretation process, the program is looking for both the "what" and the "why" — in other words, the cascade of molecular events that lead to the development of a disease.

The mechanistic understanding of a disease may lead to new or better ways of treating patients by identifying existing drugs that can be repositioned to treat different conditions or by discovering new drug targets for which new therapies can be developed.

Methods development

The Bioinformatics Program is developing new methods that are published in peer-reviewed journals. Methods and applications are made freely available via bioinformatics software packages.

Program leader