Statistics and Informatics Software
The Center for Clinical and Translational Science (CCaTS), the Division of Biomedical Statistics and Informatics, and the CCaTS Biostatistics, Epidemiology and Research Design (BERD) Resource at Mayo Clinic collaborate to offer a series of online professional development modules on statistics and informatics software.
CME: Mayo Clinic College of Medicine and Science is accredited by the Accreditation Council for Continuing Medical Education to provide continuing medical education for physicians.
Mayo Clinic College of Medicine and Science designates this enduring material for a maximum of 1 AMA PRA Category 1 Credit(s)™. Physicians should claim only the credit commensurate with the extent of their participation in the activity.
Mayo Clinic employees: Enroll now
Find CCaTS online modules in My Learning.
Click Learning and search by Category or Course Title.
About JMP software
JMP, a statistical software package from SAS, is designed for dynamic data visualization. It allows study teams to obtain descriptive statistics and perform simple data analysis.
For those who need personalized assistance, the CCaTS Service Center also offers one-on-one statistical and epidemiological consultations.
All modules in this series are presented by Ross A. Dierkhising, a master's-level biostatistician who also consults through the CCaTS' BERD Resource.
Introduction to JMP for Research
"JMP Dataset Creation"
At the completion of this module, learners will be able to create a new dataset by entering data into a JMP table, define column properties for a variable, import a dataset from another file (such as Excel) and export a JMP dataset to another file type (such as Excel). Originally released July 1, 2012; renewed January 1, 2017; credit expires Dec. 31, 2017.
Non-Mayo participants: Enroll now
"Creating New Variables in JMP Datasets Using Formulas"
At the completion of this module, learners will be able to describe the functions of the formula editor, calculate the difference between two numeric variables, calculate body mass index from weight and height, use if/then statements, and calculate a time interval. No CME credit is available.
"JMP Dataset Manipulations"
At the completion of this module, learners will be able to subset rows from a dataset, sort data by one or more variables, concatenate two datasets, check for duplicate subjects in a dataset, and join two datasets. No CME credit is available.
"Computation of Descriptive Statistics and How to Save Results in JMP"
At the completion of this module, learners will be able to describe how variable modeling types determine which statistics are computed, identify where to find specific descriptive statistics in the output, use a "by" variable to obtain descriptive statistics within groups, save output in various formats and journal an output to collate results. No CME credit is available.
Basic Statistical Methods Using JMP
"Analysis of Means Using JMP"
At the completion of this module, learners will be able to conduct a one-sample test of a mean, conduct a two-sample test of means, compare means from more than two independent groups and compare two dependent means. No CME credit is available.
"Analysis of Proportions Using JMP"
At the completion of this module, learners will be able to conduct a one-sample test of a proportion, conduct a two-sample test of proportions, compare proportions from more than two independent groups, conduct a one-sample test for a multinomial distribution, compare multinomial distributions from independent groups and compare two dependent proportions. No CME credit is available.
"Agreement Analysis Using JMP"
At the completion of this module, learners will be able to assess agreement between measurements by estimating the kappa statistic for categorical variables and creating a Bland-Altman plot for continuous variables. No CME credit is available.
"Linear Regression and Correlation Using JMP"
At the completion of this module, learners will be able to fit a simple linear regression model with a continuous or categorical predictor, estimate correlation coefficients and fit a multiple linear regression model. No CME credit is available.
"Logistic Regression and ROC Curves Using JMP"
At the completion of this module, learners will be able to fit a simple logistic regression model with a continuous or categorical predictor, construct an ROC curve from a model with one continuous predictor and assess cut-off values, fit a multiple logistic regression model, and construct an ROC curve from a model with multiple predictors and assess cut-off values. No CME credit is available.
"Survival (Time to Event) Analysis Using JMP"
At the completion of this module, learners will be able to estimate Kaplan-Meier survival and failure curves, properly estimate the median time to event, compare Kaplan-Meier curves between groups, fit a univariate Cox proportional hazards model with a continuous or categorical predictor, fit a multivariable Cox proportional hazards model, and recognize predictors that JMP cannot handle (time-dependent covariates). No CME credit is available.
These modules are intended for researchers who want to learn about the technologies available in the Medical Genome Facility (formerly the Advanced Genomics Technology Center) at Mayo Clinic and receive training on commercial and public bioinformatics software, public bioinformatics databases and genome browsers.
Effective use of bioinformatics software enables researchers to study — on a genome-wide scale — gene expression, exon composition of transcripts, protein binding sites, genotypes, gene copy number variations, DNA methylation and other molecular events.
All modules in this series are developed by Alexey A. Leontovich, Ph.D., of Mayo Clinic in Rochester, Minn., who also oversees bioinformatics consulting in the CCaTS.
To view the online modules that might best fit your needs, see the Online Module Cross-Reference Guide.
"Overview of Bioinformatics Tools"
During a laboratory experiment, we may obtain RNA or DNA samples, which are then processed and analyzed. This module provides an overview of the technology that translates RNA or DNA information into a digital form — that is, computer files. It focuses on methods and software that are used to analyze these files and obtain biological interpretation of the results. No CME credit is available.
"Essentials of Microarray Technology"
High-throughput microarray technologies enable researchers to study gene expression, exon composition of transcripts, protein binding sites, SNPs, gene copy number variations and other molecular events on the genome-wide scale. There are numerous platforms, but each follows a similar basic concept that must be understood to allow researchers to design quality experiments. No CME credit is available.
"Essentials of Microarray Technology: Affymetrix and Illumina Platforms"
High-throughput microarray technologies enable researchers to study gene expression, exon composition of transcripts, protein binding sites, SNPs, gene copy number variations and other molecular events on the genome-wide scale. Although there are multiple platforms implementing this technology, there are a number of key principles that are critical for understanding its potential as well as its limits.
Affymetrix has additional arrays that can be utilized to ask specific questions of a sample. Illumina microarray technology has proven to be one of the best platforms for gene expression profiling, microRNA and DNA methylation profiling, and SNP detection.
This module explains key principles and features of Affymetrix microarray technology, the background between exon arrays and tiling arrays using Affymetrix technology, and key principles and features of Illumina microarray technology. It also provides background information on learning how to analyze Illumina data. No CME credit is available.
"Obtaining Data and Gene Expression Profiles from Gene Expression Omnibus Microarray Database and File Decompression"
Gene Expression Omnibus (GEO) is a free database maintained by the National Center for Biotechnology Information (NCBI) that holds thousands of datasets from published expression data. Public microarray databases are online repositories of microarray data of different types (gene expression, exon arrays and SNPs). They are often supplied with some data analysis and/or visualization tools.
These databases contain data generated with different microarray platforms, including spotted arrays, Affymetrix, Illumina and Agilent. Researchers can use this site to gather preliminary data on a gene or genes of interest. This module explains how to obtain experimental data from public databases, search for genes of interest and download them. No CME credit is available.
"Using Partek Genomics Suite for Microarray Data Analysis: The Basics"
Partek GS is an excellent software application for the analysis of gene expression, exon composition of transcripts, copy number variation, gene annotation and more. This is an introductory-level module that covers basic functionalities of the software. No CME credit is available.
"Using Ingenuity Pathway Analysis Software for Gene Pathways Analysis"
Ingenuity Pathway Analysis is one of the main software applications for the analysis of molecular pathways, biological networks of genes and proteins, data mining of biological annotations, data visualization, and reporting tools. This is an introductory-level module that explains how to use the basic functionalities of the software. No CME credit is available.
"Introduction to Cytoscape Software"
Cytoscape is a free software application for analysis and integration of gene network and gene interaction data. It is powerful software for integration and visualization of complex biological data of various types, such as complex gene networks, gene expression data (microarray, PCR or next-generation sequencing), methylation data, gene copy number variations and more.
Cytoscape accepts various formats of gene/protein/metabolite interaction files, directly uploads data from a large number of databases, and has a large set of tools for gene/protein/metabolite functional analysis and annotation, including tools for Gene Ontology analysis.
At the completion of this module, you will be able to install Cytoscape on your computer, load data into the software, learn main controls and tools, perform a simple analysis, and visually represent the results. No CME credit is available.
"An Introduction to the Sequence Read Archive and Conversion of SRA Format to FASTQ Format"
Massively parallel sequencing technologies (next-generation sequencing, or NGS) are more and more used to quantitate genes, gradually replacing microarray technologies. Data generated by NGS platforms demands development of storage devices, data transfer methods and hardware that can efficiently handle very big volumes of data. This also extends to the data analysis software.
In this module, we will show how NGS data obtained in gene expression experiments (RNA-seq) are archived in the National Institutes of Health database. We will also explain how this data can be retrieved from the archive and used for data analysis. No CME credit is available.
"Introduction to Integrative Genomics Viewer (IGV)"
Genome browsers are software for biological interpretation and visualization of data. With the advent of next-generation sequencing (NGS) technologies, genome browsers became one of the key components of analytical workflows. However, they can be used to mine genomics data and visualize results obtained by various types of technologies.
One of the leading genome browsers is Integrative Genomic Viewer (IGV), which was developed at the Broad Institute. In this module, we will show how to use IGV for visualization and analysis of NGS data. No CME credit is available.
"Introduction to Galaxy Software"
Massively parallel sequencing, also called next-generation sequencing, generates massive amounts of data. Storage and analysis of this data requires specialized software and hardware. Galaxy is the major free system (software and hardware) that meets those requirements.
Galaxy has software tools for the analysis of ChIP-seq, RNA-seq and DNA-seq data (including methylation analysis), transcription factor binding analysis, genotyping analysis, copy number variation, gene expression, and gene/DNA variant detection, as well as the EMBOSS package of tools.
In this module, you will learn how to set up a free account with Galaxy, learn main controls and learn how to import data from the UCSC Genome Browser, which is well-integrated with Galaxy. You will also learn how to do a simple analysis using Galaxy. No CME credit is available.
"Loading Data into Galaxy Software"
Data files generated in experiments using next-generation sequencing technology are very big — up to hundreds of gigabytes (Hi-seq). Smaller files (less than 2 GB) can be uploaded into the software directly from your computer, but to upload bigger files (up to 50 GB), you need to use FTP client software.
This module shows how to upload small and big files into Galaxy. No CME credit is available.
"ChIP-seq Analysis with Galaxy Software (Part 1)"
This module shows how to analyze changes in methylation status of DNA using data generated with ChIP-seq method — more specifically, methylated DNA immunoprecipitation (MeDIP) and sequenced with Illumina Genome Analyzer IIx. We use FASTQ files as an input data, so most techniques used in this analysis are applicable to other types of ChIP-seq data.
The whole analysis involves many steps, so to make it easier to understand and learn, we divided it into three parts, each of which explains a particular analytical process. This first part shows how to identify DNA regions that have a different level of methylation in different samples. No CME credit is available.
"Analysis of Genome-Wide Methylation Pattern Using Galaxy Software (Part 2)"
Various experimental treatments or biological states (such as embryonic development) may affect genome-wide methylation pattern: number of hypermethylated sites in introns, exons, in promoter regions and CpG islands.
Once differentially methylated sites (hyper- and hypo-methylated) are identified, the next step in the analysis is to find differences in and characterize methylation pattern — that is, distribution of frequencies of differentially methylated sites in specified genomic regions. In this module, we explain how to do this type of analysis using Galaxy software. No CME credit is available.
"Methylation Analysis of Promoter Regions Using Galaxy Software (Part 3)"
Currently, it is widely accepted in the literature that methylation of promoter regions causes suppression of gene expression. Specific locations of methylated sites may affect binding of transcription factors to the promoter. This is the reason why detailed analysis of methylation of gene promoters is especially important in the context of the whole study.
In this module, we will explain how to analyze differential methylation of promoter regions on a genome-wide scale using Galaxy software. No CME credit is available.
"Preparation of FASTQ Files for RNA-seq Analysis Using Galaxy Software (Part 1)"
It is common that you receive your data in BAM or FASTQ format files. It is preferable to start your analysis with FASTQ files to be able to check the quality of sequence reads and remove low-quality fragments. FASTQ files come in different flavors depending on the specifics of the sequencing platform that was used to generate them (for example, single read versus paired-end read).
There are software tools developed specifically to convert BAM files into high-quality FASTQ files. In this module, we demonstrate how to install these software tools on your computer and process BAM files. No CME credit is available.
"RNA-seq Analysis Using Galaxy Software (Part 2)"
Typically, the goal of the RNA-seq analysis is to identify genes that are differentially expressed in groups of samples that are being compared. This will help to find alternatively spliced transcripts in groups of samples and alternative transcription start sites.
Starting with FASTQ files, it takes numerous analytical steps to obtain the final result. There are multiple parameters for each algorithm used at each step of the analysis, and these parameters need to be set correctly. In this module, we walk you through the major steps of the analytical workflow and explain how to correctly set the parameters. No CME credit is available.
"Interpretation of the Results of RNA-seq Analysis Using Galaxy Software (Part 3)"
The output of the RNA-seq analysis is genomic coordinates of the regions with requested properties (differential gene expression, alternative splicing, alternative TSS and more). To understand the biological result of the experiment, these coordinates need to be annotated with gene/transcript symbols/names/IDs, positions of TSS and more.
In this module, we demonstrate how to obtain these annotations and map them to the results of the RNA-seq analysis. No CME credit is available.
"Using UCSC Genome Browser for Data Visualization and Analysis"
Our knowledge about DNA elements of the genomes of different species is growing exponentially. The volume and complexity of genomic information poses many challenges for data analysis, one of which is visual representation of genomic data.
The genome browser developed by the University of California, Santa Cruz, known as the UCSC Genome Browser, is one of the main bioinformatics resources that handles this challenge. Not only does this software enable researchers to visually represent complex genomic data, but it also empowers scientists to explore vast amounts of genomic information gathered by the research community worldwide and recorded in the UCSC databases.
In this module, we explain how to visually represent genomic data using the UCSC Genome Browser and also how to search and retrieve specific information from UCSC databases. No CME credit is available.
"Gene Pathways Analysis With MetaCore Software"
MetaCore is one of the most advanced software programs for the analysis of molecular pathways, biological networks of genes and proteins, data mining of biological annotations, and data visualization and reporting tools. This software can perform analysis of data obtained by various technological platforms — microarray, next-generation sequencing, PCR, ELISA, protein mass spectrometry and more — because the input data is a list of gene or protein IDs.
This is an introductory-level module that explains how to import data into the software and use basic functionalities for simple pathway and network analysis. No CME credit is available.