Test-Based Variable Selection Via Cross-Validation

Peter F. Thall; Richard Simon; David A. Grier

doi:10.1080/10618600.1992.10474575

Test-Based Variable Selection Via Cross-Validation

Peter F. Thall, Richard Simon, David A. Grier

Biostatistics

Research output: Contribution to journal › Article › peer-review

7 Scopus citations

Abstract

Test-based variable selection algorithms in regression often are based on sequential comparison of test statistics to cutoff values. A predetermined a level typically is used to determine the cutoffs based on an assumed probability distribution for the test statistic. For example, backward elimination or forward stepwise involve comparisons of test statistics to prespecified t or F cutoffs in Gaussian linear regression, while a likelihood ratio. Wald, or score statistic, is typically used with standard normal or chi square cutoffs in nonlinear settings. Although such algorithms enjoy widespread use, their statistical properties are not well understood, either theoretically or empirically. Two inherent problems with these methods are that (1) as in classical hypothesis testing, the value of a is arbitrary, while (2) unlike hypothesis testing, there is no simple analog of type I error rate corresponding to application of the entire algorithm to a data set. In this article we propose a new method, backward elimination via cross-validation (BECV), for test- based variable selection in regression. It is implemented by first finding the empirical p value α*, which minimizes a cross-validation estimate of squared prediction error, then selecting the model by running backward elimination on the entire data set using α* as the nominal p value for each test. We present results of an extensive computer simulation to evaluate BECV and compare its performance to standard backward elimination and forward stepwise selection.

Original language	English (US)
Pages (from-to)	41-61
Number of pages	21
Journal	Journal of Computational and Graphical Statistics
Volume	1
Issue number	1
DOIs	https://doi.org/10.1080/10618600.1992.10474575
State	Published - Mar 1992

Keywords

Af-fold cross-validation
Backward elimination
Computer simulation
Multiple linear regression
Stepwise selection
Variable selection

ASJC Scopus subject areas

Statistics and Probability
Discrete Mathematics and Combinatorics
Statistics, Probability and Uncertainty

Access to Document

10.1080/10618600.1992.10474575

Cite this

@article{e65ecf4717db4437be9111d5365d49e1,

title = "Test-Based Variable Selection Via Cross-Validation",

abstract = "Test-based variable selection algorithms in regression often are based on sequential comparison of test statistics to cutoff values. A predetermined a level typically is used to determine the cutoffs based on an assumed probability distribution for the test statistic. For example, backward elimination or forward stepwise involve comparisons of test statistics to prespecified t or F cutoffs in Gaussian linear regression, while a likelihood ratio. Wald, or score statistic, is typically used with standard normal or chi square cutoffs in nonlinear settings. Although such algorithms enjoy widespread use, their statistical properties are not well understood, either theoretically or empirically. Two inherent problems with these methods are that (1) as in classical hypothesis testing, the value of a is arbitrary, while (2) unlike hypothesis testing, there is no simple analog of type I error rate corresponding to application of the entire algorithm to a data set. In this article we propose a new method, backward elimination via cross-validation (BECV), for test- based variable selection in regression. It is implemented by first finding the empirical p value α*, which minimizes a cross-validation estimate of squared prediction error, then selecting the model by running backward elimination on the entire data set using α* as the nominal p value for each test. We present results of an extensive computer simulation to evaluate BECV and compare its performance to standard backward elimination and forward stepwise selection.",

keywords = "Af-fold cross-validation, Backward elimination, Computer simulation, Multiple linear regression, Stepwise selection, Variable selection",

author = "Thall, {Peter F.} and Richard Simon and Grier, {David A.}",

year = "1992",

month = mar,

doi = "10.1080/10618600.1992.10474575",

language = "English (US)",

volume = "1",

pages = "41--61",

journal = "Journal of Computational and Graphical Statistics",

issn = "1061-8600",

publisher = "American Statistical Association",

number = "1",

}

TY - JOUR

T1 - Test-Based Variable Selection Via Cross-Validation

AU - Thall, Peter F.

AU - Simon, Richard

AU - Grier, David A.

PY - 1992/3

Y1 - 1992/3

N2 - Test-based variable selection algorithms in regression often are based on sequential comparison of test statistics to cutoff values. A predetermined a level typically is used to determine the cutoffs based on an assumed probability distribution for the test statistic. For example, backward elimination or forward stepwise involve comparisons of test statistics to prespecified t or F cutoffs in Gaussian linear regression, while a likelihood ratio. Wald, or score statistic, is typically used with standard normal or chi square cutoffs in nonlinear settings. Although such algorithms enjoy widespread use, their statistical properties are not well understood, either theoretically or empirically. Two inherent problems with these methods are that (1) as in classical hypothesis testing, the value of a is arbitrary, while (2) unlike hypothesis testing, there is no simple analog of type I error rate corresponding to application of the entire algorithm to a data set. In this article we propose a new method, backward elimination via cross-validation (BECV), for test- based variable selection in regression. It is implemented by first finding the empirical p value α*, which minimizes a cross-validation estimate of squared prediction error, then selecting the model by running backward elimination on the entire data set using α* as the nominal p value for each test. We present results of an extensive computer simulation to evaluate BECV and compare its performance to standard backward elimination and forward stepwise selection.

AB - Test-based variable selection algorithms in regression often are based on sequential comparison of test statistics to cutoff values. A predetermined a level typically is used to determine the cutoffs based on an assumed probability distribution for the test statistic. For example, backward elimination or forward stepwise involve comparisons of test statistics to prespecified t or F cutoffs in Gaussian linear regression, while a likelihood ratio. Wald, or score statistic, is typically used with standard normal or chi square cutoffs in nonlinear settings. Although such algorithms enjoy widespread use, their statistical properties are not well understood, either theoretically or empirically. Two inherent problems with these methods are that (1) as in classical hypothesis testing, the value of a is arbitrary, while (2) unlike hypothesis testing, there is no simple analog of type I error rate corresponding to application of the entire algorithm to a data set. In this article we propose a new method, backward elimination via cross-validation (BECV), for test- based variable selection in regression. It is implemented by first finding the empirical p value α*, which minimizes a cross-validation estimate of squared prediction error, then selecting the model by running backward elimination on the entire data set using α* as the nominal p value for each test. We present results of an extensive computer simulation to evaluate BECV and compare its performance to standard backward elimination and forward stepwise selection.

KW - Af-fold cross-validation

KW - Backward elimination

KW - Computer simulation

KW - Multiple linear regression

KW - Stepwise selection

KW - Variable selection

UR - http://www.scopus.com/inward/record.url?scp=8644248054&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=8644248054&partnerID=8YFLogxK

U2 - 10.1080/10618600.1992.10474575

DO - 10.1080/10618600.1992.10474575

M3 - Article

AN - SCOPUS:8644248054

SN - 1061-8600

VL - 1

SP - 41

EP - 61

JO - Journal of Computational and Graphical Statistics

JF - Journal of Computational and Graphical Statistics

IS - 1

ER -

Test-Based Variable Selection Via Cross-Validation

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this