[Biopython-dev] sequence format readers ?
thomas at cbs.dtu.dk
thomas at cbs.dtu.dk
Wed Sep 5 14:45:40 EDT 2001
Hej,
To follow up one of the discussions and questions at ISMB in Copenhagen,
- how are we going to proceed with the sequence format reader (the
biopython variant of readseq ...)
Currently we can only have parsers for Fasta, Embl and GenBank. What we
need is a internal format and functions/modules which can read/write:
Fasta
Embl
GenBank
GCG
Phylip
PIR
MSF
Nexus
Clustal
Mase
??? - more suggestions ?
I can write most of the rules, but I guess we have to define a smart base
class/parser - where plugging in a new format should only take 5 seconds ...
If we brain storm on the design of the reader/writer, I could volunteer to
implement the format rules ...
Some things to consider:
* some formats are alignment based (e.g. clustal, phylip, nexus)
* some formats have loads of information which is lost when converted to a
lower info-rich format( e.g. Embl -> Fasta). But Embl -> GenBank should
not lose any information
* some formats allow multiple entries, some not
back-in-the-sequence-format-jungle'ly yr's
-thomas
--
Sicheritz-Ponten Thomas, Ph.D CBS, Department of Biotechnology
thomas at biopython.org The Technical University of Denmark
CBS: +45 45 252489 Building 208, DK-2800 Lyngby
Fax +45 45 931585 http://www.cbs.dtu.dk/thomas
De Chelonian Mobile ... The Turtle Moves ...
More information about the Biopython-dev
mailing list