Test-Based Variable Selection Via Cross-Validation

Peter F. Thall, Richard Simon, David A. Grier

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

Test-based variable selection algorithms in regression often are based on sequential comparison of test statistics to cutoff values. A predetermined a level typically is used to determine the cutoffs based on an assumed probability distribution for the test statistic. For example, backward elimination or forward stepwise involve comparisons of test statistics to prespecified t or F cutoffs in Gaussian linear regression, while a likelihood ratio. Wald, or score statistic, is typically used with standard normal or chi square cutoffs in nonlinear settings. Although such algorithms enjoy widespread use, their statistical properties are not well understood, either theoretically or empirically. Two inherent problems with these methods are that (1) as in classical hypothesis testing, the value of a is arbitrary, while (2) unlike hypothesis testing, there is no simple analog of type I error rate corresponding to application of the entire algorithm to a data set. In this article we propose a new method, backward elimination via cross-validation (BECV), for test- based variable selection in regression. It is implemented by first finding the empirical p value α*, which minimizes a cross-validation estimate of squared prediction error, then selecting the model by running backward elimination on the entire data set using α* as the nominal p value for each test. We present results of an extensive computer simulation to evaluate BECV and compare its performance to standard backward elimination and forward stepwise selection.

Original languageEnglish (US)
Pages (from-to)41-61
Number of pages21
JournalJournal of Computational and Graphical Statistics
Volume1
Issue number1
DOIs
StatePublished - Mar 1992

Keywords

  • Af-fold cross-validation
  • Backward elimination
  • Computer simulation
  • Multiple linear regression
  • Stepwise selection
  • Variable selection

ASJC Scopus subject areas

  • Statistics and Probability
  • Discrete Mathematics and Combinatorics
  • Statistics, Probability and Uncertainty

Fingerprint

Dive into the research topics of 'Test-Based Variable Selection Via Cross-Validation'. Together they form a unique fingerprint.

Cite this