TY - GEN
T1 - Directed-information based feature-selection for tissue-specific sequences
AU - Rao, Arvind
AU - Hero, Alfred O.
AU - States, David J.
AU - Engel, James Douglas
N1 - Copyright:
Copyright 2009 Elsevier B.V., All rights reserved.
PY - 2007
Y1 - 2007
N2 - Motif discovery for the identification of functional regulatory elements underlying gene expression is a challenging problem. Sequence inspection often provides valuable clues to discovery of novel motifs (including transcription factor sites) with uncharacterized function in gene expression. Coupled with the complexity underlying tissue-specific gene expression, there are several motifs that are putatively responsible for gene expression in a certain cell type. This has important implications in understanding fundamental biological processes such as development and disease progression. In this work we present an approach to the identification of motifs (not necessarily transcription factors) and examine its application to several questions in current bioinformatics research. These motifs are seen to discriminate tissue-specific genomic regions from those that are not tissue-specific. We propose the use of directed information for such classification constrained feature selection, and then, use the selected features with a support vector machine (SVM) classifier to characterize the tissue-specificity of any sequence of interest. This analysis yields several novel interesting motifs that merit further experimental characterization. The last part of this paper presents a framework for exploring the relationship between such discriminatory transcription factor motifs, and the corresponding tissue-specificity, using both sequence and expression modalities.
AB - Motif discovery for the identification of functional regulatory elements underlying gene expression is a challenging problem. Sequence inspection often provides valuable clues to discovery of novel motifs (including transcription factor sites) with uncharacterized function in gene expression. Coupled with the complexity underlying tissue-specific gene expression, there are several motifs that are putatively responsible for gene expression in a certain cell type. This has important implications in understanding fundamental biological processes such as development and disease progression. In this work we present an approach to the identification of motifs (not necessarily transcription factors) and examine its application to several questions in current bioinformatics research. These motifs are seen to discriminate tissue-specific genomic regions from those that are not tissue-specific. We propose the use of directed information for such classification constrained feature selection, and then, use the selected features with a support vector machine (SVM) classifier to characterize the tissue-specificity of any sequence of interest. This analysis yields several novel interesting motifs that merit further experimental characterization. The last part of this paper presents a framework for exploring the relationship between such discriminatory transcription factor motifs, and the corresponding tissue-specificity, using both sequence and expression modalities.
KW - Comparative genomics
KW - Directed information
KW - Tissue-specific genes
KW - Transcriptional regulation
UR - http://www.scopus.com/inward/record.url?scp=47849108996&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=47849108996&partnerID=8YFLogxK
U2 - 10.1109/SSP.2007.4301249
DO - 10.1109/SSP.2007.4301249
M3 - Conference contribution
AN - SCOPUS:47849108996
SN - 142441198X
SN - 9781424411986
T3 - IEEE Workshop on Statistical Signal Processing Proceedings
SP - 210
EP - 214
BT - 2007 IEEE/SP 14th Workshop on Statistical Signal Processing, SSP 2007, Proceedings
T2 - 2007 IEEE/SP 14th WorkShoP on Statistical Signal Processing, SSP 2007
Y2 - 26 August 2007 through 29 August 2007
ER -