[Biopython-dev] Bio.Sequencing

Mon Jun 29 03:49:26 EDT 2009

2009/6/29 Peter Cock <p.j.a.cock at googlemail.com>

> On Sun, Jun 28, 2009 at 3:10 PM, Cymon Cox<cymon.cox at googlemail.com>
> wrote:
> > Hi Peter,
> >
> > What is the long-term future of Bio.Sequencing? With the (very cool)
> > QualityIO stuff now in SeqIO, the Phd module looks a bit out of place -
> is
> > there any reason not to move both Ace and Phd code to SeqIO ie
> > in the AceIO and PhdIO interfaces?
>
> In the case of FASTQ and QUAL files, everything gets stored in
> the SeqRecord, so I didn't see any reason to have something in
> Bio.Sequencing (although perhaps things like mapping between
> the PHRED and Solexa scores could live there, along with the
> basic parser used internally giving string tuples - does this sound
> worth doing?).
>
> As you know, currently the SeqIO "ace" and "phd" are simply built
> on top of Bio.Sequencing.Ace and Bio.Sequencing.PhD, and only
> transforms a subset of the data into a SeqRecord object.

Yes, but now that per_letter_annotation's are in SeqRecord there is no
reason not to store the Phd 'phred_qualities' and 'peak_locations', so all
the Phd file attributes can be stored in a SeqRecord - I altered the parser
to do this.

> This also
> describes the SwissProt parsing now - the general model is we have
> a SeqRecord interface (which may not cover all the details), and an
> underlying more file format specific objects used to hold the data.
>

> > I ask because Ive written a Phd writer class for the SeqIO interface
> > and initially added it to PhdIO.
>
> Do you want to file an enhancement bug, and then either upload
> the code to bugzilla, or give a link to a github branch to we can
> have a look?
>
> If your writer takes SeqRecord objects, then I think it would make
> sense to go in Bio.SeqIO.PhdIO (as I have done for GenBank,
> although this is in part because I have some intentions to simplify
> the Bio.GenBank code, and having another writer with a another
> API in there would make this more complicated).
>
> It would also make sense to have a writer in Bio.Sequencing.Phd
> taking its Record objects (and have Bio.SeqIO turn SeqRecord
> objects into PhD Record objects, and call that). Perhaps this would
> be a better idea as it is more flexible, but it would be more work,
> and could be slower ;)

Yes, this was my concern. As I have it now, the parser code is in
Bio.Sequencing.Phd and is called by the Bio.SeqIO.PhdIO, but the writer code
is in PhdIO. I could move the write_record to the Phd module for symmetry,
but as all the Phd attributes can be stored in SeqRecord, the Phd parser
code could just as rationally be moved to PhdIO.

Cheers, C.
--