[Biopython-dev] Reading sequences: FormatIO, SeqIO, etc
Peter (BioPython Dev)
biopython-dev at maubp.freeserve.co.uk
Tue Aug 22 16:46:39 UTC 2006
Albert Krewinkel wrote:
> I'd like to seriously start working on an EMBL parser, but ...
As the de-facto GenBank module owner, I'm also interested getting EMBL
and GenBank working nicely together. The big question BEFORE you/we
start any serious coding on EMBL support is how it fits into BioPython.
Do we (a) add a new module like the existing Bio.Fasta and Bio.GenBank,
or (b) use a new framework like the one I've put forward here:
http://bugzilla.open-bio.org/show_bug.cgi?id=2059
> ... there are some things I'm concerned about: It surely would be a
> good thing to build the SequenceIO and Parser stuff upon some base
> classes and agree on using certain tools which are (or will be) used
> in the hole project.
What I was proposing was that all the new sequence file format parsers
should be implemented as subclasses of my SequenceIterator class -
either directly (e.g. FastaIterator) or indirectly (e.g. the
PfamStockholmIterator) and they should return SeqRecord objects.
I am open to discussion about how interlaced file formats should be
handled, but I think I have shown how the SequenceIterator based scheme
could work using the Clustalw and Stockholm formats as examples.
> Since I never received any education/training on software
> development, I would appreciate if someone can tell me how the code's
> structure should look like -- the current Scanner/Consumer code
> isn't any help.
I agree that the current Scanner/Consumer code won't be much help.
The fact that the current Bio.GenBank parser uses the Scanner/Consumer
model reflects the fact that I rewrote (in Python) what had been done
using Martel/Mindy. This is one excuse for the state of that code of
mine ;)
I don't think the flexibility of the Scanner/Consumer model is needed
just to turn Embl/GenBank data into SeqRecord objects (and only into
SeqRecord objects).
> How about using reStructuredText in docstrings? IMO it leaves the
> .__doc__ string very readable but improves epydoc generated
> descriptions.
I'm not familiar with how any existing API documentation is extracted
from the source code...
Peter
More information about the Biopython-dev
mailing list