[BioPython] Paper in Bioinformatics

Scott T. Kelley kelleys@ucsu.colorado.edu
Tue, 24 Oct 2000 12:00:40 -0700


I just ran into a recent Python reference in Bioinformatics about object
oriented parsing of biological databases that I thought members of the list
might find interesting (if you don't already know of the ref). If they
aren't already aware and members of Biopython, they might be worth
contacting...Enjoy! -Scott

Bioinformatics 2000 Jul;16(7):628-638
Object-oriented parsing of biological databases with Python.

Ramu C, Gemund C, Gibson TJ

European Molecular Biological Laboratory, Meyerhofstrasse 1, Postfach
10.2209, Heidelberg, Germany.

[Record supplied by publisher]

Motivation: While database activities in the biological area are increasing
rapidly, rather little is done in the area of parsing them in a simple and
object-oriented way. Results: We present here an elegant, simple yet
powerful way of parsing biological flat-file databases. We have taken EMBL,
SWISSPROT and GENBANK as examples. EMBL and SWISS-PROT do not differ much in
the format structure. GENBANK has a very different format structure than
EMBL and SWISS-PROT. Extracting the desired fields in an entry (for example
a sub-sequence with an associated feature) for later analysis is a constant
need in the biological sequence-analysis community: this is illustrated with
tools to make new splice-site databases. The interface to the parser is
abstract in the sense that the access to all the databases is independent
from their different formats, since parsing instructions are hidden.
Availability: The modules are available at
http://shag.embl-heidelberg.de:8000/Biopy/ Contact:
chenna@embl-heidelberg.de Supplementary information:
http://shag.embl-heidelberg.de:8000/Biopy/