TY - GEN
T1 - Multiple sequence alignment based on profile alignment of intermediate sequences
AU - Lu, Yue
AU - Sze, Sing Hoi
N1 - Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2007
Y1 - 2007
N2 - Despite considerable efforts, it remains difficult to obtain accurate multiple sequence alignments. By using additional hits from database search of the input sequences, a few strategies have been proposed to significantly improve alignment accuracy, including the construction of profiles from the hits while performing profile alignment, the inclusion of high scoring hits into the input sequences, the use of intermediate sequence search to link distant homologs, and the use of secondary structure information. We develop an algorithm that integrates these strategies to further improve alignment accuracy by modifying the pairHMM approach in ProbCons to incorporate profiles of intermediate sequences from database search and utilize secondary structure predictions as in SPEM. We test our algorithm on a few sets of benchmark multiple alignments, including BAliBASE, HOMSTRAD, PREFAB and SAB-mark, and show that it significantly outperforms MAFFT and ProbCons, which are among the best multiple alignment algorithms that do not utilize additional information, and SPEM, which is among the best multiple alignment algorithms that utilize additional hits from database search. The improvement in accuracy over SPEM can be as much as 5 to 10% when aligning divergent sequences. A software program that implements this approach (ISPAlign) is at http://faculty.cs.tamu.edu/shsze/ ispalign.
AB - Despite considerable efforts, it remains difficult to obtain accurate multiple sequence alignments. By using additional hits from database search of the input sequences, a few strategies have been proposed to significantly improve alignment accuracy, including the construction of profiles from the hits while performing profile alignment, the inclusion of high scoring hits into the input sequences, the use of intermediate sequence search to link distant homologs, and the use of secondary structure information. We develop an algorithm that integrates these strategies to further improve alignment accuracy by modifying the pairHMM approach in ProbCons to incorporate profiles of intermediate sequences from database search and utilize secondary structure predictions as in SPEM. We test our algorithm on a few sets of benchmark multiple alignments, including BAliBASE, HOMSTRAD, PREFAB and SAB-mark, and show that it significantly outperforms MAFFT and ProbCons, which are among the best multiple alignment algorithms that do not utilize additional information, and SPEM, which is among the best multiple alignment algorithms that utilize additional hits from database search. The improvement in accuracy over SPEM can be as much as 5 to 10% when aligning divergent sequences. A software program that implements this approach (ISPAlign) is at http://faculty.cs.tamu.edu/shsze/ ispalign.
UR - http://www.scopus.com/inward/record.url?scp=34547401979&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34547401979&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-71681-5_20
DO - 10.1007/978-3-540-71681-5_20
M3 - Conference contribution
AN - SCOPUS:34547401979
SN - 3540716807
SN - 9783540716808
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 283
EP - 295
BT - Research in Computational Molecular Biology - 11th Annual International Conference, RECOMB 2007, Proceedings
PB - Springer Verlag
T2 - 11th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2007
Y2 - 21 April 2007 through 25 April 2007
ER -