Technical Reports

These Technical Reports are saved as PDF files. You will need Adobe Acrobat Reader software to view and print these Technical Reports. Some of these reports are large and may take a few minutes to load. Please be patient.

Credit Where Credit Is Due

If you use Technical Reports from Mayo Clinic, please acknowledge the original contributor of the material.

As an example:
TM Therneau, PM Grambsch, and VS Pankratz: Technical Report Series No. 66, Penalized Survival Models and Frailty. Department of Health Science Research, Mayo Clinic, Rochester, Minnesota, 2000.

Copyright 2005 Mayo Foundations for Medical Education and Research, all rights reserved. Permission is granted for unlimited distribution for non-commercial use.

Technical Reports

#84   The Basics of Propensity Scoring and Marginal Structural Models
This report describes the basics of marginal structural models, propensity scores and inverse probability weighting. These methods are useful for addressing confounding in observational studies. A detailed SAS example is shown with discussion of model assumptions and checks.
Cynthia S. Crowson, Louis A. Schenck, Abigail Green, Elizabeth J. Atkinson, Terry M. Therneau, 8/2013.



Comparison of Mayo Clinic Coding Systems
This goal of this project was to determine how three different disease coding systems (HICDA, ICD-9, and SNOMED CT) compared at retrieving lists of individuals with specific medical conditions from among patients seen at Mayo Clinic throughout 2004.
Jennifer St. Sauver, James Buntrock, Diana Rademacher, Debra Albrecht, Melissa Gregg, Donna Ihrke, Vinod Kaggal, and Amy Weaver [December 2010]



Attributable Risk Estimation in Cohort Studies
Population attributable risk (PAR) is a function of time because both the prevalence of a risk factor and its effect on exposed individuals may change over time, as may the underlying risk of disease. Time-specific PAR can be estimated based on cumulative incidence adjusted for the competing risk of death.
Cynthia S. Crowson, Terry M. Therneau, and W. Michael O'Fallon [September 2008]



Poisson models for person-years and expected rates
This report summarizes approaches used to model observed events with respect to the expected number of events. Examples are provided for examining the excess risk or relative risk using both additive and multiplicative models.
Elizabeth J. Atkinson, Cynthia S. Crowson, Rachel A. Pederson, Terry Therneau [September 2008]



Concordance for Survival Time Data: Fixed and Time-Dependent Covariates and Possible Ties in Predictor and Time
Concordance, or synonymously the C-statistic, is a valuable measure of model discrimination in analyses involving survival time data. This report provides a definition of concordance in the case of survival data, allowing for time-dependent covariates with the counting process data representation and accounting for ties in the covariates and times.
Walter K Kremers and The William J. von Liebig Transplant Center [April 2007]



Finding Optimal Cutpoints for Continuous Covariates with Binary and Time-to-Event Outcomes
This report provides an overview of the literature and describes a unified strategy for finding optimal cutpoints with respect to binary and time-to-event outcomes. Two SAS macros for identifying a cutpoint have been developed in conjunction with this Technical Report.
Brent Williams, Jayawant N. Mandrekar, Sumithra J. Mandrekar, Stephen S. Cha, Alfred F. Furth [June 2006]



Estimating Genetic Components of Variance for Quantitative Traits in Family Studies using the MULTIC routines
This reports provides an overview of the theory behind the variance components approach for analyzing one or more quantitative traits in the face of familial correlation. It also provides an introduction to the Splus/R multic library which contains software to carry out this analysis.
Mariza de Andrade, Elizabeth J. Atkinson, Eric Lunde, Christopher I. Amos, Jianfang Chen [April 2006]



Joint Estimation of Calibration and Expression for High-Density Oligonucleotide Arrays
We present a unified algorithm which incorporates normalization and class comparison in one analysis using probe level perfect match and mismatch data. The algorithm is based on calibration models common to most biological assays, and the resulting chip-specific parameters have a natural interpretation. We show that the algorithm fits into the statistical generalized linear models framework, describe a practical fitting strategy and present results of the algorithm based on commonly used metrics.
Ann L. Oberg., Douglas W. Mahoney, Karla V. Ballman, Terry M. Therneau [February 2006]



Evaluation of a Simultaneous Mass-Calibration and Peak-Detection Algorithm for FT-ICR Mass Spectrometry
Electrospray ionization Fourier transform ion cyclotron resonance mass spectrometry (ESI-FT-ICR-MS) is a potentially superior biomarker discovery platform because it offers high mass-measurement accuracy and high mass-measurement precision as well as high resolving power over a broad mass-to-charge range. Herein, we describe and evaluate a simultaneous mass-calibration and peak-detection algorithm that exploits resolved isotopic peak-spacing information as well as space-charge frequency shifts across isotopic clusters that represent the same molecular species but differ in charge states by integer values.
Jeanette E. Eckel-Passow, Terry M. Therneau, Ann L. Oberg, Christopher J. Mason, David C. Muddiman [January 2006]



Why does PLIER really work?
The PLIER (Probe Logarithmic Intensity ERror) algorithm was developed by Affymetrix and released in 2004. It is part of several commercially available software packages that analyze Genechip data such as Strand Genomic's Avadis and Stratagene's ArrayAssist . The PLIER algorithm produces an improved gene expression value (a summary value for a probe set) for the GeneChip microarray platform as compared to the Affymetrix MAS5 algorithm. In this report, we look at why the PLIER algorithm performs so well given that its derivation is based on a biologically implausible assumption.
Terry M. Therneau and Karla V. Ballman [November 2005]



An Exploration of Affymetrix Probe-Set Intensities in Spike-In Experiments
In this report, we look at the characteristics of the relationship between the observed probe intensity values produced by the Affymetrix GeneChip platform and the known concentration level of the target gene. This is done using data from three publicly available spike-in gene experiments. The report discusses characteristics of the relationship and implications for statistical models and analysis of Affymetrix GeneChip data. We learned a considerable amount from looking at plots of the data, which are provided in the appendices (Appendix A, Appendix B, and Appendix C), and encourage readers to look and learn from the data themselves.
Karla V. Ballman and Terry M. Therneau [July 2005]



Evaluating Methods of Symmetry
Knowing the symmetry of the underlying data is important for parametric analysis, fitting distributions or doing transformations to the data. We evaluate five different methods to assess skewness. We have also developed a comprehensive and efficient SAS> macro for computing the various skewness measures and the appropriate power transformation, if one exists, to make an asymmetric distribution symmetric.
Mandrekar JN, Mandrekar SJ, and Cha SS[January 2005]



Transmission Disequilibrium Methods for Family-Based Studies
The study of the association of genetic markers with complex traits has generated a wide range of statistical methods, particularly those that are based on transmission-disequilibrium. This report provides a review of methods in this area through approximately 1999.
Schaid DJ [JUL 2004]



Duane's Little Handbook of Advice for Young Biostatisticians on How to Work with Investigators
This handbook is intended to provide young biostatisticians with a set of guidelines about how to effectively work with investigators. Not all of these guidelines will work well in every consulting situation. You may find that you may develop better ways for you to deal with some situations than those which are given here. The advice given here should, however, help you to at least formulate for yourself how you should conduct your own consultations.
Ilstrup DM [AUG 2004]



Normalization of Two-Channel Microarray Experiments: A Semiparametric Approach
An important underlying assumption of any experiment is that the experimental subjects are similar across levels of the treatment variable, so that changes in the response variable can be attributed to exposure to the treatment under study. This assumption is often not valid in the analysis of a microarray experiment due to systematic biases in the &easured expression levels related to experimental factors such as spot location (often referred to as a print-tip effect), arrays, dyes, and various interactions of these effects. Thus, normalization is a critical initial step in the analysis of a microarray experiment, where the objective is to balance the individual signal intensity levels across the experimental factors, while maintaining the effect due to the treatment under investigation.
Burgoon LD, Boverhof DR, Eckel JE, Gennings C, Therneau TM, Zacharewski TR [JUL 2004]



Faster cyclic loess: normalizing DNA arrays via linear models
This technical report describes a normalization technique that yields results similar to cyclic loess normalization with speed comparable to quantile normalization.
Ballman KV, Grill DE, Oberg AL, Therneau TM[NOV 2003]



An Introduction fo Multiple Imputation Methods: Handling Missing Data with SAS V8.2
This report is organized to give a general overview of the basic concepts of data imputation, with emphasis on application. The purpose is to explain the basic principles of multiple imputation for handling missing data and how to implement this method using SAS version 8.2.
Vargas-Chanes D, Decker PA, Schroeder DR, and Offord KP [JULY 2003]



Penalized Survival Models and Frailty
We demonstrate that solutions for gamma shared frailty models can be obtained exactly via penalized estimation. Similarly, Gaussian frailty models are closely linked to penalized models. This makes it possible to apply penalized estimation to other frailty models using Laplace approximations. Fitting frailty models with penalized likelihoods can be made quite rapid by taking advantage of computational methods available for penalized models. We have implemented penalized regression for the coxph function of Splus and illustrate the algorithms with examples using the Cox model.
Therneau TM, Grambsch PM, and Pankratz VS [JUNE 2000]



MCSTRAT: A SAS Macro to Analyze Data From a Matched or Finely Stratified Case-Control Design
A case-control design is a common approach used to assess disease-exposure relationships, and the logistic regression model is the most common framework for the analysis of such data. This model expresses the logit transform of the disease probability as a linear combination of independent, or exposure, variables. MCSTRAT: A SAS Macro
Vierkant RA, Kosanke JL, Therneau TM, and Naessens JM [FEB 2000]



Calculating Incidence Rates Among Hospitalized Residents of Olmsted County, Minnesota.
The purpose of this technical report is to describe the SAS macro, %inchosp, which allows users to calculate the incidence rate of any disease or event among hospitalized residents of Olmsted County, Minnesota from 1980 to 1990 providing the location at onset is collected.
Lohse CM, Petterson, TM, O'Fallon WM, Melton LJ [FEB 1999]



Expected Survival Based on Hazard Rates (update)
This paper is an extension and update of Technical Report #52. An update to the rate tables themselves is based on the recently released data from the 1990 decennial census, which allowed us to replace extrapolated 1990 death rates with actual rates, and to improve the extrapolated year 2000 values. Much of the material in the prior report is contained here, in order to make this document useful on it's own.
Therneau TM and Offord J [Feb 1999]



Computing the Cox Model for Case Cohort Designs
Prentice proposed a case-cohort design as an efficient subsampling mechanism for survival studies. Several other authors have expanded on these ideas to create a family of related sampling plans, along with estimators for the covariate effects. We describe how to obtain the proposed parameter estimates and their variance estimates using standard software packages, with SAS and S-Plus as particular examples.
Therneau TM and Li H [July 1998]



An Introduction to Recursive Partitioning Using the RPART Routines
Short overview of the methods found in the rpart routines, which implement many of the ideas found in the CART (Classification and Regression Trees) book and programs of Breiman, Friedman, Olshen and Stone.
Therneau TM and Atkinson EJ [Nov 1997]

RPART Condensed Version


This document is a shortened version of technical report #61 focusing on the examples and the function options.
Atkinson EJ and Therneau TM[Feb 2000]



Extending the Cox Model
Since its introduction, the proportional hazards model proposed by Cox has become the workhorse of regression analysis for censored data. In the last several years, the theoretical basis for the model has been solidified by connecting it to the study of counting processes and martingale theory. Comprehensive accounts of the underlying mathematics are given in the books of Fleming and Harrington and of Andersen et. al. These developments have, in turn, led to the introduction of several new extensions of the original model. These include the analysis of residuals, time varying covariates, time dependent coefficients, multiple/correlated observations, multiple time scales, time dependent strata, and estimation of underlying hazard functions. The aim of this monograph is to show how many of these methods and extensions of the model can be approached using standard statistical software, in particular the S-Plus and SAS packages. As such, it should be a bridge between the statistical journals and actual practice.
Therneau TM [June 1996]



How many stratification factors are "too many" to use in a randomization plan?
Controlled Clinical Trials, 14:98-108, 1993
The issue of stratification and its role in patient assignment has generated much discussion, mostly focused on its importance to a study or lack thereof. This report focuses on a much narrower problem: assuming that stratified assignment is desired, how many factors can be accommodated? This is investigated for two methods of balanced patient assignment, the first is based on the minimization method of Taves and the second on the commonly used method of stratified assignment. Simulation results show that the former method can accommodate a large number of factors (10-20) without difficulty, but that the latter begins to fail if the total number of distinct combinations of factor levels is greater than approximately n=2. The two methods are related to a linear discriminant model, which helps to explain the results.
Therneau TM [1993]



Computerized matching of cases to controls
The purpose of this report is to describe a new SAS macro, %match, written to facilitate the matching of cases to controls, where one case is matched to one or more controls.
Bergstralh EJ and Kosanke JL [April 1995]



Extrapolation of the U.S. Life Tables
Therneau TM & Scheib C [October 1994]



Generalized Population Attributable Risk Estimation
Kahn MJ, O'Fallon WM, Sicks JD [April 1994]



A package for survival analysis in S
Therneau TM [June 1994]



Expected survival based on hazard rates
Therneau TM, Sicks JD, Bergstralh EJ, and Offord J [March 1994]


The new PROC paired
Grambsch PM and Therneau TM [Feb. 1993]



The new PROC paired
Bergstralh EJ, Offord KP, and Kosanke JL [Oct. 1992]



A numerical solution for text information retrieval and its application in patient care classification.
Yang Y and Chute C [Feb. 1992]



Calculating incidence, prevalence and mortality rates in Olmsted County, Minnesota: An update
Bergstralh EJ, Offord KP, Chu CP, Beard CM, O'Fallon WM, and Melton LJ [April 1992]



A random survey of Olmsted County, Minnesota, 1973
O'Brien PD [Mar 1991]



The GLIM procedure: An interface to the SAS_system
Grambsch PM, Kosanke JL, Therneau TM, Schaid DJ, Zinsmeister AR, Wieand HS, Offord KP, LarsonKeller [Mar. 1990]



Simple robust tests for comparing dispersion in bivariate data
Grambsch PM [Dec. 1989]



Optimal two-stage screening designs for survival comparisons
Schaid DJ, Wieand HS and Therneau TM [Nov. 1988]



Robust procedures for testing equality of covariance matrices
O'Brien PC [Oct. 1988]



A SAS macro for comparing covariance matrices
O'Brien PC and Stertz CD [Oct. 1988]



A SAS macro for validating stepwise regression
O'Brien PC and Kosanke JL [Oct. 1988]



A SAS macro for regression
O'Brien PC, Stertz CD, Bergstralh EJ, Daood SL and Offord KP [Oct. 1988]



Martingale based residuals for survival models
Therneau TM, Grambsch PG and Fleming TR [Apr. 1988]



The effects of preliminary tests for nonlinearity in regression
Grambsch PM and O'Brien PC [Mar. 1988]



Projected Rochester and Olmsted County populations for 1981-1995
Bergstralh EJ and Offord KP [Feb. 1988]



Conditional probabilities used in calculating cohort expected survival
Bergstralh EJ and Offord KP [Jan. 1988]



Enumerating the optimal designs for a Phase II trial
Therneau TM, Wieand HS and Chang M [Sept. 1987]



Use of an Apple II+ personal computer to enter and code diagnostic data in a research setting
Beard CM and Goss S [Apr. 1987]



A two-stage design for randomized trials with binary outcomes
Wieand HS and Therneau TM [Oct. 1986]



Designs for group sequential Phase II clinical trials
Chang M, Therneau TM, Wieand HS, Cha S [July 1986]



Proc Twosample: A SAS Procedure for the two-sample t and rank-sum tests with extensions
O'Brien PC, Offord KP, Kosanke JL [Nov. 1987]



PERSONYRS: A SAS procedure for person year analyses
Bergstralh E, Offord KP, Kosanke JL, and Augustine G [Apr. 1986]



Comparing two samples: Extensions of the t, rank sum, and log rank tests
O'Brien PC [Nov. 1985]



Naessens JM, Offord KP, Scott WF, and Daood SL [Oct. 1984]



Fleming TR, Augustine GA, Elcombe SA, Offord KP [Nov. 1984]



Procedures for testing efficacy for clinical trials with multiple endpoints
O'Brien PC [Sept. 1983]



Designs for group sequential tests
Fleming TR, Harrington DP, O'Brien PC [Apr. 1984]



On robust estimation of location for arbitrarily right-censored data
Green SJ and Crowley J [Feb. 1983]



Statistical methods for analyzing variables measured repeatedly over time
Zinsmeister AR [Being revised]



A runs test based on run lengths
O'Brien PC [Jan. 1983]



A SAS MACRO which utilizes local and reference population counts appropriate for incidence, prevalence, and mortality rate calculations in Rochester and Olmsted County, Minnesota
Schroeder DJ and Offord KP [Aug. 1982]



Performing serial testing of treatment effects
Fleming TR, Green SJ, and Harrington DP [July 1982]



Reprint file management using SAS
Davis C [June 1981]



A SAS MACRO for calculating the quadratic discriminant function
Davis C [June 1981]



A SAS MACRO for EDF goodness-of-fit tests
Davis C [June 1981]



The usefulness of mathematical models in studying observational data
O'Brien PC [Mar. 1981]



A data management model for small-scale multi-center studies
Tilley BC, Offord KP and Oenning R [Feb. 1981]



One sample multiple testing procedure for Phase II clinical trials
Fleming TR [Feb. 1981]



A class of rank test procedures for censored survival data
Harrington DP and Fleming TR [Feb. 1981]



Adjusting significance levels in censored data when using multiple tests simultaneously
Fleming TR and Harrington DP [Dec. 1980]



An investigation into the operating characteristics of some two-sample nonparametric test procedures used for censored survival data
Fleming TR and Harrington DP [Aug. 1980]



A class of hypothesis tests for one and two sample censored survival data
Fleming TR and Harrington DP [Aug. 1980]



Nonparametric estimation of the survival distribution in censored data
Fleming TR and Harrington DP [Aug. 1979]



A likelihood test for multivariate serial correlation
O'Brien PC [May 1979]



A multi-stage procedure for clinical trials
O'Brien PC and Fleming TR [Oct. 1978]



A system for storage and retrieval of echocardiographic data
Offord KP, Weber VP, Augustine GA, Giuliani ER [Apr. 1978]

#3 & 6


Modified Kolmogorov-Smirnov test procedures with application to arbitrarily right censored data
Fleming TR, O'Fallon JR, O'Brien PC, Harrington DP [Aug. 1980]



A system for assuring protocol adherence in clinical trials
Golenzer H, Taylor WF, O'Fallon JR, and Silvers A [Mar. 1978]



Asymptotic efficiencies for some nonparametric tests of association with censored data
O'Brien PC [May 1978]