TY - JOUR
T1 - Effective function annotation through catalytic residue conservation
AU - George, Richard A.
AU - Spriggs, Ruth V.
AU - Bartlett, Gail J.
AU - Gutteridge, Alex
AU - MacArthur, Malcolm W.
AU - Porter, Craig T.
AU - Al-Lazikani, Bissan
AU - Thornton, Janet M.
AU - Swindells, Mark B.
PY - 2005/8/30
Y1 - 2005/8/30
N2 - Because of the extreme impact of genome sequencing projects, protein sequences without accompanying experimental data now dominate public databases. Homology searches, by providing an opportunity to transfer functional information between related proteins, have-become the de facto way to address this. Although a single, well annotated, close relationship will often facilitate sufficient annotation, this situation is not always the case, particularly if mutations are present in important functional residues. When only distant relationships are available, the transfer of function information is more tenuous, and the likelihood of encountering several well annotated proteins with different functions is increased. The consequence for a researcher is a range of candidate functions with little way of knowing which, if any, are correct. Here, we address the problem directly by introducing a computational approach to accurately identify and segregate related proteins into those with a functional similarity and those where function differs. This approach should find a wide range of applications, including the interpretation of genomics/proteomics data and the prioritization of targets for high-throughput structure determination. The method is generic, but here we concentrate on enzymes and apply high-quality catalytic site data. In addition to providing a series of comprehensive benchmarks to show the overall performance of our approach, we illustrate its utility with specific examples that include the correct identification of haptoglobin as a nonenzymatic relative of trypsin, discrimination of acid-D-amino acid ligases from a much larger ligase pool, and the successful annotation of BioH, a structural genomics target.
AB - Because of the extreme impact of genome sequencing projects, protein sequences without accompanying experimental data now dominate public databases. Homology searches, by providing an opportunity to transfer functional information between related proteins, have-become the de facto way to address this. Although a single, well annotated, close relationship will often facilitate sufficient annotation, this situation is not always the case, particularly if mutations are present in important functional residues. When only distant relationships are available, the transfer of function information is more tenuous, and the likelihood of encountering several well annotated proteins with different functions is increased. The consequence for a researcher is a range of candidate functions with little way of knowing which, if any, are correct. Here, we address the problem directly by introducing a computational approach to accurately identify and segregate related proteins into those with a functional similarity and those where function differs. This approach should find a wide range of applications, including the interpretation of genomics/proteomics data and the prioritization of targets for high-throughput structure determination. The method is generic, but here we concentrate on enzymes and apply high-quality catalytic site data. In addition to providing a series of comprehensive benchmarks to show the overall performance of our approach, we illustrate its utility with specific examples that include the correct identification of haptoglobin as a nonenzymatic relative of trypsin, discrimination of acid-D-amino acid ligases from a much larger ligase pool, and the successful annotation of BioH, a structural genomics target.
KW - EC
KW - Enzymes
KW - Function prediction
KW - PSI-BLAST
UR - http://www.scopus.com/inward/record.url?scp=24644517567&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=24644517567&partnerID=8YFLogxK
U2 - 10.1073/pnas.0504833102
DO - 10.1073/pnas.0504833102
M3 - Article
C2 - 16037208
AN - SCOPUS:24644517567
SN - 0027-8424
VL - 102
SP - 12299
EP - 12304
JO - Proceedings of the National Academy of Sciences of the United States of America
JF - Proceedings of the National Academy of Sciences of the United States of America
IS - 35
ER -