[Biopython-dev] [Bug 2285] New: Creating Bio.AlignIO to cope with alignments like Bio.SeqIO does sequences

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Mon Apr 30 09:20:37 UTC 2007


http://bugzilla.open-bio.org/show_bug.cgi?id=2285

           Summary: Creating Bio.AlignIO to cope with alignments like
                    Bio.SeqIO does sequences
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


I would like to introduce an alignment equivalent of the SeqRecord based
Bio.SeqIO module, sample usage:

from Bio import AlignIO
for alignment in AlignIO.parse(open("many.phy"), "phylip") :
     print "Alignment with %i sequences of length %i" \
         % (len(alignment.get_all_seqs()),
            alignment.get_alignment_length())

As with Bio.SeqIO, I would have an input function "parse" which returns an
iterator - but giving Alignment objects.  Based on my own experience, most
alignment files usually contain a single alignment, but this is not the general
case.

For example, given a concatenated PHLYIP alignment (e.g. produced by seqboot)
containing five alignments of four sequences, then AlignIO.parse() would return
an interator giving five Alignment objects. Currently Bio.SeqIO won't parse
such PHLYIP files, but I would make it return twenty SeqRecords.

Clustalw and Stockholm files can also be concatenated and can use used to hold
many different multiple sequence alignments.

Another example is the EMBOSS simple/pairs alignment format, which again is
frequently used to hold more than one alignment. I would plan to add support
for this file format in Bio.AlignIO

Details:
I would move and rewrite the existing Bio.SeqIO code for clustal, stockholm and
phylip formats to Bio.AlignIO, and modify the Bio.SeqIO parse and write
functions to offload work to Bio.AlignIO when there is no Bio.SeqIO handler
defined.

Progress:
I have this working for parsing clustal, stockholm and phylip files already -
and converting the writers is underway.

I aim to attach a (big) patch shortly, and would like some feedback/discussion
on the idea.

See also Bug 1944 to enhance the Alignment class - Bio.AlignIO can be
implemented without that work.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list