Adapting support vector machines to predict translation initiation sites in the human genome

Rehan Akbani; Stephen Kwek

doi:10.1109/CSBW.2005.18

Adapting support vector machines to predict translation initiation sites in the human genome

Rehan Akbani, Stephen Kwek

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

3 Scopus citations

Abstract

This study is concerned with predicting Translation Initiation Sites (TIS) in the human genome that start with the nucleotide sequence ATG. This sequence occurs 104 million times in the entire genome. However, current estimates predict that there are only about 30,000 or so TIS in the human genome, giving an imbalance ratio of about 1:3500 for TIS ATG vs. non-TIS ATG sites. Algorithms that are designed using datasets that have low imbalance ratio may not be well suited to predict TIS at the genomic level. In this paper, we modified the SVM algorithm that can handle moderately high imbalance ratio. The F-measures for other approaches were: Linear Discriminant 0%, SVM with under-sampling 4.1%, SVM with over-sampling 8.2%, Neural Network 13.3%, Decision Tree 20%, our approach 44%. This shows how poorly standard approaches perform at the genomic level due to the high imbalance ratio. Our approach improves the performance significantly.

Original language	English (US)
Title of host publication	2005 IEEE Computational Systems Bioinformatics Conference, Workshops and Poster Abstracts
Pages	143-145
Number of pages	3
DOIs	https://doi.org/10.1109/CSBW.2005.18
State	Published - 2005
Externally published	Yes
Event	2005 IEEE Computational Systems Bioinformatics Conference, Workshops and Poster Abstracts - Stanford, CA, United States Duration: Aug 8 2005 → Aug 11 2005

Publication series

Name	2005 IEEE Computational Systems Bioinformatics Conference, Workshops and Poster Abstracts

Other

Other	2005 IEEE Computational Systems Bioinformatics Conference, Workshops and Poster Abstracts
Country/Territory	United States
City	Stanford, CA
Period	8/8/05 → 8/11/05

ASJC Scopus subject areas

General Engineering

Access to Document

10.1109/CSBW.2005.18

Cite this

Akbani, R., & Kwek, S. (2005). Adapting support vector machines to predict translation initiation sites in the human genome. In 2005 IEEE Computational Systems Bioinformatics Conference, Workshops and Poster Abstracts (pp. 143-145). Article 1540576 (2005 IEEE Computational Systems Bioinformatics Conference, Workshops and Poster Abstracts). https://doi.org/10.1109/CSBW.2005.18

Adapting support vector machines to predict translation initiation sites in the human genome. / Akbani, Rehan; Kwek, Stephen.
2005 IEEE Computational Systems Bioinformatics Conference, Workshops and Poster Abstracts. 2005. p. 143-145 1540576 (2005 IEEE Computational Systems Bioinformatics Conference, Workshops and Poster Abstracts).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Akbani, R & Kwek, S 2005, Adapting support vector machines to predict translation initiation sites in the human genome. in 2005 IEEE Computational Systems Bioinformatics Conference, Workshops and Poster Abstracts., 1540576, 2005 IEEE Computational Systems Bioinformatics Conference, Workshops and Poster Abstracts, pp. 143-145, 2005 IEEE Computational Systems Bioinformatics Conference, Workshops and Poster Abstracts, Stanford, CA, United States, 8/8/05. https://doi.org/10.1109/CSBW.2005.18

@inproceedings{4c62b8af9eb44abb892ac0498aa98658,

title = "Adapting support vector machines to predict translation initiation sites in the human genome",

abstract = "This study is concerned with predicting Translation Initiation Sites (TIS) in the human genome that start with the nucleotide sequence ATG. This sequence occurs 104 million times in the entire genome. However, current estimates predict that there are only about 30,000 or so TIS in the human genome, giving an imbalance ratio of about 1:3500 for TIS ATG vs. non-TIS ATG sites. Algorithms that are designed using datasets that have low imbalance ratio may not be well suited to predict TIS at the genomic level. In this paper, we modified the SVM algorithm that can handle moderately high imbalance ratio. The F-measures for other approaches were: Linear Discriminant 0%, SVM with under-sampling 4.1%, SVM with over-sampling 8.2%, Neural Network 13.3%, Decision Tree 20%, our approach 44%. This shows how poorly standard approaches perform at the genomic level due to the high imbalance ratio. Our approach improves the performance significantly.",

author = "Rehan Akbani and Stephen Kwek",

note = "Copyright: Copyright 2011 Elsevier B.V., All rights reserved.; 2005 IEEE Computational Systems Bioinformatics Conference, Workshops and Poster Abstracts ; Conference date: 08-08-2005 Through 11-08-2005",

year = "2005",

doi = "10.1109/CSBW.2005.18",

language = "English (US)",

isbn = "0769524427",

series = "2005 IEEE Computational Systems Bioinformatics Conference, Workshops and Poster Abstracts",

pages = "143--145",

booktitle = "2005 IEEE Computational Systems Bioinformatics Conference, Workshops and Poster Abstracts",

}

TY - GEN

T1 - Adapting support vector machines to predict translation initiation sites in the human genome

AU - Akbani, Rehan

AU - Kwek, Stephen

PY - 2005

Y1 - 2005

N2 - This study is concerned with predicting Translation Initiation Sites (TIS) in the human genome that start with the nucleotide sequence ATG. This sequence occurs 104 million times in the entire genome. However, current estimates predict that there are only about 30,000 or so TIS in the human genome, giving an imbalance ratio of about 1:3500 for TIS ATG vs. non-TIS ATG sites. Algorithms that are designed using datasets that have low imbalance ratio may not be well suited to predict TIS at the genomic level. In this paper, we modified the SVM algorithm that can handle moderately high imbalance ratio. The F-measures for other approaches were: Linear Discriminant 0%, SVM with under-sampling 4.1%, SVM with over-sampling 8.2%, Neural Network 13.3%, Decision Tree 20%, our approach 44%. This shows how poorly standard approaches perform at the genomic level due to the high imbalance ratio. Our approach improves the performance significantly.

AB - This study is concerned with predicting Translation Initiation Sites (TIS) in the human genome that start with the nucleotide sequence ATG. This sequence occurs 104 million times in the entire genome. However, current estimates predict that there are only about 30,000 or so TIS in the human genome, giving an imbalance ratio of about 1:3500 for TIS ATG vs. non-TIS ATG sites. Algorithms that are designed using datasets that have low imbalance ratio may not be well suited to predict TIS at the genomic level. In this paper, we modified the SVM algorithm that can handle moderately high imbalance ratio. The F-measures for other approaches were: Linear Discriminant 0%, SVM with under-sampling 4.1%, SVM with over-sampling 8.2%, Neural Network 13.3%, Decision Tree 20%, our approach 44%. This shows how poorly standard approaches perform at the genomic level due to the high imbalance ratio. Our approach improves the performance significantly.

UR - http://www.scopus.com/inward/record.url?scp=33749080543&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33749080543&partnerID=8YFLogxK

U2 - 10.1109/CSBW.2005.18

DO - 10.1109/CSBW.2005.18

M3 - Conference contribution

AN - SCOPUS:33749080543

SN - 0769524427

SN - 9780769524429

T3 - 2005 IEEE Computational Systems Bioinformatics Conference, Workshops and Poster Abstracts

SP - 143

EP - 145

BT - 2005 IEEE Computational Systems Bioinformatics Conference, Workshops and Poster Abstracts

T2 - 2005 IEEE Computational Systems Bioinformatics Conference, Workshops and Poster Abstracts

Y2 - 8 August 2005 through 11 August 2005

ER -

Adapting support vector machines to predict translation initiation sites in the human genome

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this