The Statistical Genetics and Genetic Epidemiology Lab has created an array of innovative software for the analysis of complex genetic mechanisms and genetic epidemiology.

Each software package is publicly available for use in biomedical research. The software incorporates Mayo Clinic's quantitative methods and includes well-documented procedures and examples for use.


The armitage R package performs the Armitage trend test to evaluate the association of a trait with SNP genotype predictors given a dose vector of length 3. (Software updated October 2015.)


CAVIARBF is a fine-mapping tool for identifying potential causal variants in a region where it's assumed that causal variants exist in the data.

CAVIARBF can be used to prioritize potential causal variants for follow-up functional analysis after performing genome-wide association studies. It uses an approximate Bayesian method and can deal with multiple causal variants.

One output is the marginal posterior probability of each variant being causal. The input requires only the marginal test statistics and correlations among variants, so it's also useful for analyzing meta-analysis results.

CAVIARBF is implemented in C++. (Software updated October 2015.)

See: Chen W, Larrabbe BR, Ovsyannikova IG, Kennedy RB, Haralambieva IH, Poland GA, Schaid DJ. Fine mapping causal variants with an approximate Bayesian method using marginal test statistics. Genetics. 2015;200:719.


The GeneSetScan software offers a general approach to scan genome-wide SNP data for gene-set association analyses.

The test statistic for a gene set is based on score statistics for generalized linear models and takes advantage of the directed acyclic graph structure of the gene ontology to create gene sets. The method can use other gene-set structures, such as the Kyoto Encyclopedia of Genes and Genomes (KEGG), or even user-defined sets.

The approach of Dr. Schaid's Statistical Genetics and Genetic Epidemiology Lab combines SNPs into genes, and genes into gene sets, but ensures that positive and negative effects on a trait do not cancel. To control for multiple testing of many gene sets, the lab uses an efficient computational strategy that accounts for linkage disequilibrium and correlations among genes and gene sets, and provides accurate step-down adjusted p values for each gene set. (Software updated October 2014.)

See: Schaid DJ, Sinnwell JP, Jenkins GD, McDonnell SK, Ingle JN, Kubo M, Goss PE, Costantino JP, Wickerham DL, Weinshilboum RM. Using the gene ontology to scan multilevel gene sets for associations in genome wide association studies. Genetic Epidemiology. 2012;36:3.


The haplo.stats package is a suite of R routines for the analysis of indirectly measured haplotypes.

The statistical methods assume that all subjects are unrelated and that haplotypes are ambiguous (because of unknown linkage phase of the genetic markers). The genetic markers are assumed to be codominant (that is, 1-to-1 correspondence between their genotypes and their phenotypes). (Software updated August 2015.)


The hwe R package allows users to test the fit of genotype frequencies to Hardy-Weinberg equilibrium proportions for autosomes and the X chromosome.

Different statistical tests are provided, along with an option to evaluate statistical significance by either exact methods or simulations. README and R source package are provided. (Software updated February 2011.)


The hweStrata program calculates an exact stratified test for HWE for diallelic markers, such as single nucleotide polymorphisms (SNPs), exact tests for HWE within each stratum, and an exact test for homogeneity of Hardy-Weinberg disequilibrium. An update for version 1.0 verifies if the exact test for homogeneity can be computed; if not, the program calculates the p value using an asymptotic test.

The hweStrata software is written in the C programming language and is available as executable for Linux x_86_64 and Solaris, in addition to the source code. (Software updated May 2011.)

See: Schaid DJ, Batzler AJ, Jenkins GD, Hildebrandt MAT. Exact tests of Hardy-Weinberg equilibrium and homogeneity of disequilibrium across strata. American Journal of Human Genetics. 2006;79:1071.


The ibdreg package is written for S-PLUS and R to test genetic linkage with covariates by regression methods with response IBD sharing for relative pairs. It accounts for correlations of IBD statistics and covariates for relative pairs within the same pedigree.

See: Schaid DJ, Sinnwell JP, Thibodeau SN. Robust multipoint identity-by-descent mapping for affected relative pairs. American Journal of Human Genetics. 2005;76:128.

README and package sources are provided. (Software updated December 2006.)


The kinship2 R package contains routines to handle family data with a pedigree object. The primary methods include the creation of pedigrees, plotting, trimming and the calculation of kinship matrices. (Software updated July 2015.)

See: Sinnwell JP, Therneau TM, Schaid DJ. The kinship2 R package for pedigree data. Human Heredity. 2014;78:91.


The ld.pairs R package contains a method to compute composite measures of linkage disequilibrium, their variances and covariances, and statistical tests for all pairs of alleles from two loci when linkage phase is unknown. It is an extension of Weir and Cockerham (1989) to apply to multiallelic loci. README and package source are provided. (Software updated October 2015.)

See: Schaid DJ. Linkage disequilibrium testing when linkage phase is unknown. Genetics. 2004;166:505.


The mend.err R package checks pedigrees for mendelian errors and, when errors are found, systematically jackknifes every typed pedigree member to determine if eliminating this member will remove all mendelian errors from the pedigree. (Software updated February 2011.)


PedBLIMP is a tool for genotype imputation of individuals with pedigree information.

PedBLIMP uses both relatedness information between family members and correlations among genotypes from an input reference panel. Family members can have completely missing genotypes or very different density of genotype markers. It is implemented in R but runs as a command line tool with Rscript. (Software updated October 2015.)

See: Chen W, Schaid DJ. PedBLIMP: Extending linear predictors to impute genotypes in pedigrees. Genetic Epidemiology. 2014;38:531.


The pedgene R package performs gene-level kernel and burden association tests for genetic variants with disease status and continuous traits for pedigree data and unrelated subjects. (Software updated July 2015.)

See: Schaid DJ, McDonnell SK, Sinnwell JP, Thibodeau SN. Multiple genetic variant association testing by collapsing and kernel methods with pedigree or population structured data. Genetic Epidemiology. 2013;37:409.

regmed: Regularized Mediation Analysis

Mediation analysis for multiple mediators by penalized structural equation models with different types of penalties depending on whether there are multiple mediators and only one exposure and one outcome variable (using sparse group lasso) or multiple exposures, multiple mediators, and multiple outcome variables (using lasso, L1, penalties).