[Biopython-dev] generic format reader interface

Andrew Dalke dalke at acm.org
Mon Apr 9 02:27:21 EDT 2001


We've been putting the different formats under Bio/*.  Bioperl
makes things available through a standard interface at Bio::IO.
I like the biopython way since I think that's the only way to
capture everything a database might do, but I also see the need
for a centralized way to do I/O.

What I'm thinking of is a centralized registry, which let you
specify:
  - input data type (some unique string, like "swissprot version='38'"
                     or a tuple like ("swissprot", "38") )
  - requested record type (another unique stream, like "Seq" or
                         "SProt")

This function would return an iterator for that input and output
type.  For example:

  iterator = Bio.IO.parseFile(open("sprot.dat"), input="swissprot",
                              record="Seq")

  while 1:
      record = iterator.next()
      if record is None:
          break
      ... work with the Seq record ...

Not sure of the details now, but by using this sort of interface
allows resolution to the best parser available for that need.  Eg,
it could be something which reads the record into a SProt then
converts the Sprot to a Seq, or it could go directly from the
record to a Seq, if someone wants to write the appropriate
specialization.

What would be really nice is if the API had the ability to allow
something like

  iterator = Bio.IO.parseFile(open("sprot.dat"), input="swissprot",
                              record="fasta")

and have this return each record as the FASTA formatted string,
and work either because:
  - there is a swissprot -> FASTA string builder directly
or
  - there is a swissprot -> Seq builder and a Seq -> FASTA converter

Still thinking about it.

					Andrew



More information about the Biopython-dev mailing list