[BioPython] Bio.SeqIO and Bio.AlignIO

Peter biopython at maubp.freeserve.co.uk
Wed May 9 13:13:23 EDT 2007


Dear Biopython people,

I hope most of you have tried, or at least looked at, the new Bio.SeqIO
code introduced in Biopython 1.43.  It is described in the 
cookbook/tutorial and on the website here:

http://www.biopython.org/wiki/SeqIO

Additional feedback would be welcome.  For example - are we missing the
ability to read or write your favourite file format?  Or should I
prioritise reading the annotation in Swiss-Prot files (Enhancement Bug
2235).

Also, I have been thinking that perhaps we do need alignment specific
handling code after all.  The current Bio.SeqIO interface works fine for
files containing a single alignment (made up of multiple sequences).
However, some files can hold multiple alignments...

Two examples that come to mind are (1) re-sampled alignments held as
concatenated phylip files, produced by the seqboot tool in the PHYLIP
suite, and (2) multiple pairwise alignments produced by the EMBOSS
programs needle and water.

What I have in mind is an extension of the Bio.SeqIO functions which
work on SeqRecords, to a set of Bio.AlignIO functions which work on
Alignment objects. i.e. The AlignIO.parse() function would be an
iterator returning alignment objects, and the AlignIO.write() function
would accept Alignment objects.

This would allow code like this:

from Bio import AlignIO
for alignment in AlignIO.parse(open("many.phy"), "phylip") :
      print "Alignment with %i sequences of length %i" \
          % (len(alignment.get_all_seqs()),
             alignment.get_alignment_length()


Would anyone like to comment on the scheme?  See also Bug 2285
http://bugzilla.open-bio.org/show_bug.cgi?id=2285

Note that this is also a good time to talk about enhancing the Alignment
object itself - something Marc has raised on Bug 1944
http://bugzilla.open-bio.org/show_bug.cgi?id=1944

Peter


More information about the BioPython mailing list