Skip Navigation

Journal of Computational Biology

A Dictionary-Based Approach for Gene Annotation

To cite this article:
Lior Pachter, Serafim Batzoglou, Valentin I. Spitkovsky, Eric Banks, Eric S. Lander, Daniel J. Kleitman, and Bonnie Berger. Journal of Computational Biology. July 2004, 6(3-4): 419-430. doi:10.1089/106652799318364.

Published in Volume: 6 Issue 3-4: July 5, 2004

Author information

Lior Pachter
Department of Mathematics and Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts.
Serafim Batzoglou
Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts.
Valentin I. Spitkovsky
Department of Mathematics and Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts.
Eric Banks
Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts.
Eric S. Lander
Whitehead Institute and Biology Department, Massachusetts Institute of Technology, Cambridge, Massachusetts.
Daniel J. Kleitman
Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts.
Bonnie Berger
Department of Mathematics and Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts.

ABSTRACT

This paper describes a fast and fully automated dictionary-based approach to gene annotation and exon prediction. Two dictionaries are constructed, one from the nonredundant protein OWL database and the other from the dbEST database. These dictionaries are used to obtain O(1) time lookups of tuples in the dictionaries (4 tuples for the OWL database and 11 tuples for the dbEST database). These tuples can be used to rapidly find the longest matches at every position in an input sequence to the database sequences. Such matches provide very useful information pertaining to locating common segments between exons, alternative splice sites, and frequency data of long tuples for statistical purposes. These dictionaries also provide the basis for both homology determination, and statistical approaches to exon prediction.

This paper was cited by:

How small is the center of science? Short cross-disciplinary cycles in co-authorship graphs
Chris Fields
Scientometrics. Feb 2015, Vol. 102, No. 2: 1287-1306
CrossRef
Comparisons of traditional and novel stochastic models for the generation of daily precipitation occurrences
W.W. Ng, U.S. Panu
Journal of Hydrology. Jan 2010, Vol. 380, No. 1-2: 222-236
CrossRef
Coding Exon Detection Using Comparative Sequences
Jing Wu, David Haussler
Journal of Computational Biology. Jul 2006, Vol. 13, No. 6: 1148-1164
Abstract | Full Text PDF | Reprints | Permissions | Download Metadata
Applications of Generalized Pair Hidden Markov Models to Alignment and Gene Finding Problems
Lior Pachter, Marina Alexandersson, Simon Cawley
Journal of Computational Biology. Apr 2002, Vol. 9, No. 2: 389-399
Abstract | Full Text PDF | Reprints | Permissions | Download Metadata
Applications of Supercomputers in Sequence Analysis and Genome Annotation
Gerard G. Dumancas
. : 149-175
CrossRef
About This Journal...   |   Subscribe...   |   Buy Article... 
 

Users who read this article also read

No Access
Manolis Kellis, Nick Patterson, Bruce Birren, Bonnie Berger, Eric S. Lander
Journal of Computational Biology. Jul 2004: 319-355.
Abstract | Full Text PDF | Reprints | Permissions


Publication Tools

  • Related content in Liebert Online

Search:

for

Authors:

Keywords:

Go to Advanced Search