[Biopython-dev] Bio.Sequencing

Peter Cock p.j.a.cock at googlemail.com
Mon Jun 29 07:58:00 UTC 2009


>> As you know, currently the SeqIO "ace" and "phd" are simply built
>> on top of Bio.Sequencing.Ace and Bio.Sequencing.PhD, and only
>> transforms a subset of the data into a SeqRecord object.
>
> Yes, but now that per_letter_annotation's are in SeqRecord there is no
> reason not to store the Phd 'phred_qualities' and 'peak_locations', so all
> the Phd file attributes can be stored in a SeqRecord - I altered the parser
> to do this.

Cool - that sounds like it might be worth including in Biopython 1.51
final (if you think it is ready for prime time). If as you say that your
extended Bio.SeqIO.PhdIO parse covers all the data in the PHRED
file, then perhaps we could consider deprecating Bio.Sequencing.Phd
in the future.

>> > I ask because Ive written a Phd writer class for the SeqIO interface
>> > and initially added it to PhdIO.
>>
>> Do you want to file an enhancement bug, and then either upload
>> the code to bugzilla, or give a link to a github branch to we can
>> have a look?
>>
>> If your writer takes SeqRecord objects, then I think it would make
>> sense to go in Bio.SeqIO.PhdIO (as I have done for GenBank,
>> although this is in part because I have some intentions to simplify
>> the Bio.GenBank code, and having another writer with a another
>> API in there would make this more complicated).
>>
>> It would also make sense to have a writer in Bio.Sequencing.Phd
>> taking its Record objects (and have Bio.SeqIO turn SeqRecord
>> objects into PhD Record objects, and call that). Perhaps this would
>> be a better idea as it is more flexible, but it would be more work,
>> and could be slower ;)
>
> Yes, this was my concern. As I have it now, the parser code is in
> Bio.Sequencing.Phd and is called by the Bio.SeqIO.PhdIO, but
> the writer code is in PhdIO. I could move the write_record to the
> Phd module for symmetry, but as all the Phd attributes can be
> stored in SeqRecord, the Phd parser code could just as rationally
> be moved to PhdIO.

For now, having the writer in Bio.SeqIO.PhdIO seems fine. We
could as a second step make the Bio.SeqIO.PhdIO parse self
contained, and as a third step, declare Bio.Sequencing.Phd
obsolete.

Peter



More information about the Biopython-dev mailing list