[Biopython-dev] Output sequence files
Iddo Friedberg
idoerg at cc.huji.ac.il
Sat May 26 08:52:25 EDT 2001
Hi,
Well, I'm happy that I hit on something other people find necessary.
Always good to know one is not alone :)
On Fri, 25 May 2001, Brad Chapman wrote:
Brad:
:
: I think Sarah is right on this. Seq/MutableSeq classes do not store
: any useful annotations on the sequence (except the alphabet/type of
: the sequence). Things should focus on SeqRecord, which has all of the
: annotation stuff.
:
I concur. It's just that, as you said, SeqRecord should include a lot
more stuff for good GenBank/SwissProt records. As it is, it seems to be
good enough for FASTA format.
But the big formats (anything not Fasta) are not really interconvertible,
except maybe GenBank <--> EMBL. So maybe what we need is just the
following:
1) {big formats} --> fasta converter
2) A writer for each of the formats ( e.g. SProt.Record.write(handle) )
3) EMBL <--> GenBank, but that's pretty superfluous
The problem arises from annotation. Do you think it's feasable to perform
a good GenPept (that's the GenBank translation database) <--> SwissProt
converter that will preserve everything? Or a PIR <--> SwissProt
converter? I think that anyone seeking to preserve annotation, beyond the
bare bones (organism, accession, maybe references, etc) would not want to
use a converter anyhow.
So the problem is basically downsized to having a writer for each
record types. And for SeqRecord which will be a generic record, but could
only be written out in Fasta. This way we don't get caught up in trying to
create a monster data type which integrates all the information which the
various formats like to preserve. (And I haven't even mentioned PDB
annotation yet!)
So maybe we just need a writer for each {database}.Record types, and a
to_fasta converter and writer in Tools.
Of course, we can beef up SeqRecord to have a bit more than bare-bones
annotation capability, for functional reasons, not only for flat-file
writing capabilities, but that's a different topic.
Iddo
--
Iddo Friedberg | Tel: +972-2-6758647
Dept. of Molecular Genetics and Biotechnology | Fax: +972-2-6757308
The Hebrew University - Hadassah Medical School | email: idoerg at cc.huji.ac.il
POB 12272, Jerusalem 91120 |
Israel |
http://bioinfo.md.huji.ac.il/marg/people-home/iddo/
More information about the Biopython-dev
mailing list