[Biopython-dev] Reading sequences: FormatIO, SeqIO, etc
Peter
biopython-dev at maubp.freeserve.co.uk
Wed Aug 16 14:00:36 UTC 2006
(I changed the subject to that of the previous discussion, as this
isn't really about "contributing comparative genomics tools")
Albert Krewinkel wrote:
> Hello,
>
> I read Peter's SeqIO/__init__.py replacement and if I may say so: I
> love it. Thanks a lot for this! Still, there are some things I'd
> like to talk about.
Thank you :) The code is on Bug 2059 for anyone who hasn't looked yet.
http://bugzilla.open-bio.org/show_bug.cgi?id=2059
> The _parse_genbank_features function could also be used to parse embl
> or ddjb features, therefore I think it should be named differently.
First of all, that bit of code is for a new feature which I personally
wanted - to be able to iterate over CDS features in a genbank file.
But yes, I did have in mind that it (and the GenBank parser) could be
re-used to deal with EMBL files. I have not yet taken the time to
learn the EMBL file format and how it corresponds to the GenBank file
format - but I agree a lot of the code could be shared.
> Since there is a lot of clean up effort right now: How about moving
> the SeqRecord and SeqFeature objects into the Bio.Seq module? They
> are closely related and seperate modules only clutter the namespace.
What real benefit does that give us? It will cause a certain amount
of upheaval in the short term as people will have to change their
import statements on existing scripts. If we do start a new branch
for "big changes" then I have no real problem with this suggest.
> To me, this seems to be a general problem. It's very difficult to find
> a tool to use for a certain problem if one doesn't allready know what
> to look for. I'd pretty much favour to create modules like
> Bio.structure to group modules like Bio.PDB and Bio.NMR etc. This is
> a very big change, and therefore I'd like to follow Marc's suggestion
> of splitting off a branch. In general, I pretty much agree with what
> Marc said in his <rant />.
>
> I cannot estimate how much work it would be to maintain two separate
> biopython distributions, so please forgive me if I re-suggest
> something completely idiotic here. I just don't believe there is much
> that could be lost that way.
BioPython probably would benefit from a little reorganising - and for
anything drastic like moving entire modules about, a new branch makes
sense. On the other hand, do we have the man-power to do it? Are any
of the developers familiar with all of (or even most of) the existing
modules? I would guess I have used less than half of the modules - I
have looked at the very basics of Bio.PDB for example, but have never
tried Bio.NMR
I would favour gradual incremental (and backwards compatible) changes.
Such as adding a new sequence reading module and then marking the old
code as depreciated.
For example of some small changes, have any of you looked at:
Bug 2057 - SeqRecord has no __str__ or __repr__
http://bugzilla.open-bio.org/show_bug.cgi?id=2057
Bug 1963 - Adding __str__ method to codon tables and translators
http://bugzilla.open-bio.org/show_bug.cgi?id=1963
Little things in themselves that I think would help.
Peter
More information about the Biopython-dev
mailing list