Data mining

L. Adrienne Cupples; Julia Bailey; Kevin C. Cartier; Catherine T. Falk; Kuang Yu Liu; Yuanqing Ye; Robert Yu; Heping Zhang; Hongyu Zhao

doi:10.1002/gepi.20117

Data mining

L. Adrienne Cupples, Julia Bailey, Kevin C. Cartier, Catherine T. Falk, Kuang Yu Liu, Yuanqing Ye, Robert Yu, Heping Zhang, Hongyu Zhao

Research output: Contribution to journal › Article › peer-review

7 Scopus citations

Abstract

Group 14 used data-mining strategies to evaluate a number of issues, including appropriate diagnosis, haplotype estimation, genetic linkage and association studies, and type I error. Methods ranged from exploratory analyses, to machine learning strategies (neural networks, supervised learning, and tree-based methods), to false discovery rate control of type I errors. The general motivations were to find the "story" in the data and to summarize information from a multitude of measures. Several methods illustrated strategies for better trait definition, using summarization of related traits. In the few studies that sought to identify genes for alcoholism, there was little agreement among the different strategies, likely reflecting the complexities of the disease. Nevertheless, Group 14 found that these methods offered strategies to gain a better understanding of the complex pathways by which disease develops.

Original language	English (US)
Pages (from-to)	S103-S109
Journal	Genetic epidemiology
Volume	29
Issue number	SUPPL.
DOIs	https://doi.org/10.1002/gepi.20117
State	Published - 2005

Keywords

Association studies
Haplotype estimation
Machine learning
Neural networks
Trees

ASJC Scopus subject areas

Epidemiology
Genetics(clinical)

Access to Document

10.1002/gepi.20117

Cite this

@article{c40bc0100f3140d8b8ec11aa87d4cf63,

title = "Data mining",

abstract = "Group 14 used data-mining strategies to evaluate a number of issues, including appropriate diagnosis, haplotype estimation, genetic linkage and association studies, and type I error. Methods ranged from exploratory analyses, to machine learning strategies (neural networks, supervised learning, and tree-based methods), to false discovery rate control of type I errors. The general motivations were to find the {"}story{"} in the data and to summarize information from a multitude of measures. Several methods illustrated strategies for better trait definition, using summarization of related traits. In the few studies that sought to identify genes for alcoholism, there was little agreement among the different strategies, likely reflecting the complexities of the disease. Nevertheless, Group 14 found that these methods offered strategies to gain a better understanding of the complex pathways by which disease develops.",

keywords = "Association studies, Haplotype estimation, Machine learning, Neural networks, Trees",

author = "Cupples, {L. Adrienne} and Julia Bailey and Cartier, {Kevin C.} and Falk, {Catherine T.} and Liu, {Kuang Yu} and Yuanqing Ye and Robert Yu and Heping Zhang and Hongyu Zhao",

year = "2005",

doi = "10.1002/gepi.20117",

language = "English (US)",

volume = "29",

pages = "S103--S109",

journal = "Genetic epidemiology",

issn = "0741-0395",

publisher = "Wiley-Liss Inc.",

number = "SUPPL.",

}

TY - JOUR

T1 - Data mining

AU - Cupples, L. Adrienne

AU - Bailey, Julia

AU - Cartier, Kevin C.

AU - Falk, Catherine T.

AU - Liu, Kuang Yu

AU - Ye, Yuanqing

AU - Yu, Robert

AU - Zhang, Heping

AU - Zhao, Hongyu

PY - 2005

Y1 - 2005

N2 - Group 14 used data-mining strategies to evaluate a number of issues, including appropriate diagnosis, haplotype estimation, genetic linkage and association studies, and type I error. Methods ranged from exploratory analyses, to machine learning strategies (neural networks, supervised learning, and tree-based methods), to false discovery rate control of type I errors. The general motivations were to find the "story" in the data and to summarize information from a multitude of measures. Several methods illustrated strategies for better trait definition, using summarization of related traits. In the few studies that sought to identify genes for alcoholism, there was little agreement among the different strategies, likely reflecting the complexities of the disease. Nevertheless, Group 14 found that these methods offered strategies to gain a better understanding of the complex pathways by which disease develops.

AB - Group 14 used data-mining strategies to evaluate a number of issues, including appropriate diagnosis, haplotype estimation, genetic linkage and association studies, and type I error. Methods ranged from exploratory analyses, to machine learning strategies (neural networks, supervised learning, and tree-based methods), to false discovery rate control of type I errors. The general motivations were to find the "story" in the data and to summarize information from a multitude of measures. Several methods illustrated strategies for better trait definition, using summarization of related traits. In the few studies that sought to identify genes for alcoholism, there was little agreement among the different strategies, likely reflecting the complexities of the disease. Nevertheless, Group 14 found that these methods offered strategies to gain a better understanding of the complex pathways by which disease develops.

KW - Association studies

KW - Haplotype estimation

KW - Machine learning

KW - Neural networks

KW - Trees

UR - http://www.scopus.com/inward/record.url?scp=30344484272&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=30344484272&partnerID=8YFLogxK

U2 - 10.1002/gepi.20117

DO - 10.1002/gepi.20117

M3 - Article

C2 - 16342179

AN - SCOPUS:30344484272

SN - 0741-0395

VL - 29

SP - S103-S109

JO - Genetic epidemiology

JF - Genetic epidemiology

IS - SUPPL.

ER -

Data mining

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this