TY - JOUR
T1 - A cautionary note on using secondary phenotypes in neuroimaging genetic studies
AU - for the Alzheimer's Disease Neuroimaging Initiative
AU - Kim, Junghi
AU - Pan, Wei
N1 - Funding Information:
The authors are grateful to the reviewers for constructive comments. This research was supported by NIH grants R01GM113250 , R01HL105397 , R01HL116720 and R01GM081535 , and by the Minnesota Supercomputing Institute .
Publisher Copyright:
© 2015 Elsevier Inc.
PY - 2015/11/1
Y1 - 2015/11/1
N2 - Almost all genome-wide association studies (GWASs), including Alzheimer's Disease Neuroimaging Initiative (ADNI), are based on the case-control study design, implying that the resulting case-control data are likely a biased, not random, sample of the target population. Although association analysis of the disease (e.g. Alzheimer's disease in the ADNI) can be conducted using a standard logistic regression by ignoring the biased case-control sampling, a standard linear regression analysis on a secondary phenotype (e.g. any neuroimaging phenotype in the ADNI) may in general lead to biased inference, including biased parameter estimates, inflated Type I errors and reduced power for association testing. Despite of this well known result in genetic epidemiology, to our surprise, all the published studies on secondary phenotypes with the ADNI data have ignored this potential problem. Here we aim to answer whether such a standard analysis of a secondary phenotype is valid or problematic with the ADNI data. Through both real data analyses and simulation studies, we found that, strikingly, such an analysis was generally valid (with only small biases or slightly inflated Type I errors) for the ADNI data, though cautions must be taken when analyzing other data. We also illustrate applications and possible problems of two methods specifically developed for valid analysis of secondary phenotypes.
AB - Almost all genome-wide association studies (GWASs), including Alzheimer's Disease Neuroimaging Initiative (ADNI), are based on the case-control study design, implying that the resulting case-control data are likely a biased, not random, sample of the target population. Although association analysis of the disease (e.g. Alzheimer's disease in the ADNI) can be conducted using a standard logistic regression by ignoring the biased case-control sampling, a standard linear regression analysis on a secondary phenotype (e.g. any neuroimaging phenotype in the ADNI) may in general lead to biased inference, including biased parameter estimates, inflated Type I errors and reduced power for association testing. Despite of this well known result in genetic epidemiology, to our surprise, all the published studies on secondary phenotypes with the ADNI data have ignored this potential problem. Here we aim to answer whether such a standard analysis of a secondary phenotype is valid or problematic with the ADNI data. Through both real data analyses and simulation studies, we found that, strikingly, such an analysis was generally valid (with only small biases or slightly inflated Type I errors) for the ADNI data, though cautions must be taken when analyzing other data. We also illustrate applications and possible problems of two methods specifically developed for valid analysis of secondary phenotypes.
KW - ADNI
KW - Biased sampling
KW - Case-control design
KW - GWAS
KW - Inverse probability weighting
KW - Linear regression
KW - Logistic regression
KW - SPREG
UR - http://www.scopus.com/inward/record.url?scp=84938691412&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84938691412&partnerID=8YFLogxK
U2 - 10.1016/j.neuroimage.2015.07.058
DO - 10.1016/j.neuroimage.2015.07.058
M3 - Article
C2 - 26220747
AN - SCOPUS:84938691412
SN - 1053-8119
VL - 121
SP - 136
EP - 145
JO - NeuroImage
JF - NeuroImage
ER -