[Biopython-dev] Bioformat module
Andrew Dalke
adalke at mindspring.com
Fri Jan 4 05:37:51 EST 2002
Brad:
>I'll second the "Wow! That's cool" from Jeff :-).
Thanks! to both of you. And I guess you're running a
2.2 version of Python, since I have some 'yield' statements
in there. :)
> After some
>small modifications to the GenBank format, I got GenBank minimally
>working with it.
There's going to be a few more changes. I've been working on
standard tag names for things like identifiers, cross-references,
sequence, and features (with qualifiers). Seems to work with
well with SWISS-PROT and EMBL. The idea is to do
Std.dbid(UntilSep(delimiter = ";"), {"type": "accession"})
and it puts in the correct tags.
(BTW, I'm going to change "delimiter" to "sep".)
>Attached is the format registration stuff, that
>goes in Bioformats/formats/genbank.py for anyone who is interested
>in duplicating this.
Wasn't attached.
>>>> infile.seek(0)
Shouldn't need that. The identification code should always
reseek the file to the beginning after it's finished.
>I'm definately +1 on checking this into CVS. It seems along the
>same spirit as what Thomas was working on in Bio/SeqIO/generic, but
>integrates well with Martel.
It was. I looked through the mailings to make sure I read
his (and others') discussions. It's also (IMNSHO) much better
than the Bioperl and BioJava codes because it can handle
non-sequence formats, like BLAST results, as well.
Should it be under Bio (Bio.Bioformats) or parallel to it?
Unlike Martel, I don't see it as being distributed outside
of Biopython, so I would think under. And I think the
Biopython code will have hooks to it as well. Okay, so under
it is.
> I'm not sure if I really have the full
>picture of everything yet, but from what I see it looks good!
I'm giving a short talk Friday morning. I think I know what
I'm doing well enough now that tomorrow evening I should be
able to write an overview level description of the project.
BTW, for me it was even harder to figure out the full picture.
I had to do one piece at a time until it finally started to
come together.
>I'm excited about the mixin stuff as well -- it seems like it'll
>really simplify a lot of repetitive coding for adding new formats.
>Too bad I already did all the repetitive coding for GenBank :-).
That was part of the small pieces -- see what works well then
try to abstract from there.
Mixins, however, turned out to be a dead end. There was a problem
when multiple mixins wanted the same events. There was also the
annoyance of having to __ all object variables in the hopes of
not getting conflicts with other classes. So I used a different
approach which actually makes things easier to understand, I hope.
Like I said, tomorrow evening... Hopefully.
Andrew
dalke at dalkescientific.com
More information about the Biopython-dev
mailing list