TY - GEN
T1 - Computational tool for large-scale GPIomic analysis
AU - Aguilar-Bonavides, Clemente
AU - Lopes, Felipe G.
AU - Leung, Ming Ying
AU - Nakayasu, Ernesto S.
AU - Almeida, Igor C.
PY - 2012
Y1 - 2012
N2 - Glycosylphosphatidylinositol (GPI) anchored proteins are involved in many biological processes and are of medical importance. The identification and analysis of the entire collection of free and protein linked GPIs within an organism (i.e., GPIomics) requires highly sensitive instruments. At present, liquid chromatography-tandem mass spectrometry (LC-MS/MS or MSn) is the most efficient laboratory technique for these tasks. As a typical LC-MS/MS experiment produces hundreds of thousands of spectra, the data analysis creates a major bottleneck in high-throughput GPIomic projects. Yet, no computational tool for characterizing the chemical structures of GPI is available to date. We propose a library-search algorithm to identify GPIs by matching fragment peaks in the spectra with molecular masses derived from a collection of theoretical GPI structures constructed based on properties of all currently known GPIs. The algorithm involves matching the mass-to-charge (m/z) ratio of the parent ions and fragments obtained from the observed spectra to those of the theoretical structures in the library. A theoretically possible GPI structure is assessed by a scoring scheme that incorporates its fitness values for individual observed spectra as well as its frequency of being considered as a good fit. The algorithm has been tested on a set of experimentally confirmed GPIs for the parasite Trypanosoma cruzi. The final list of 4686 predicted GPI candidates contains 76 out of the 78 known structures in the test set. Over 70% of the known structures have fitness values among the top 647. This computational tool is expected to quicken the discovery and characterization of GPI molecules, increasing the number of experimentally confirmed GPI structures. This will, in turn, help us further develop the algorithm to reduce the number of false positives.
AB - Glycosylphosphatidylinositol (GPI) anchored proteins are involved in many biological processes and are of medical importance. The identification and analysis of the entire collection of free and protein linked GPIs within an organism (i.e., GPIomics) requires highly sensitive instruments. At present, liquid chromatography-tandem mass spectrometry (LC-MS/MS or MSn) is the most efficient laboratory technique for these tasks. As a typical LC-MS/MS experiment produces hundreds of thousands of spectra, the data analysis creates a major bottleneck in high-throughput GPIomic projects. Yet, no computational tool for characterizing the chemical structures of GPI is available to date. We propose a library-search algorithm to identify GPIs by matching fragment peaks in the spectra with molecular masses derived from a collection of theoretical GPI structures constructed based on properties of all currently known GPIs. The algorithm involves matching the mass-to-charge (m/z) ratio of the parent ions and fragments obtained from the observed spectra to those of the theoretical structures in the library. A theoretically possible GPI structure is assessed by a scoring scheme that incorporates its fitness values for individual observed spectra as well as its frequency of being considered as a good fit. The algorithm has been tested on a set of experimentally confirmed GPIs for the parasite Trypanosoma cruzi. The final list of 4686 predicted GPI candidates contains 76 out of the 78 known structures in the test set. Over 70% of the known structures have fitness values among the top 647. This computational tool is expected to quicken the discovery and characterization of GPI molecules, increasing the number of experimentally confirmed GPI structures. This will, in turn, help us further develop the algorithm to reduce the number of false positives.
KW - GPI
KW - GPIomics
KW - Glycolipid
KW - Mass spectrometry
KW - Trypanosoma cruzi
UR - http://www.scopus.com/inward/record.url?scp=84869429129&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84869429129&partnerID=8YFLogxK
U2 - 10.1145/2382936.2383029
DO - 10.1145/2382936.2383029
M3 - Conference contribution
AN - SCOPUS:84869429129
SN - 9781450316705
T3 - 2012 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2012
SP - 585
EP - 587
BT - 2012 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2012
T2 - 2012 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2012
Y2 - 7 October 2012 through 10 October 2012
ER -