[Biopython-dev] Bio.SeqIO
Peter
biopython-dev at maubp.freeserve.co.uk
Wed Mar 7 23:16:48 UTC 2007
I have renamed SequenceToDict and SequencesToAlignment as to_dict and
to_alignment, which as Chris Lasher pointed out follows the PEP8 python
style guide.
While there may be better places for these to functions to live, leaving
them in SeqIO seems reasonable to me. Still - if we do want to move
them (or remove them) in the near future it would be better to do this
before releasing BioPython 1.43
Other than that, I think Bio.SeqIO is "ready" for its first release.
Michiel Jan Laurens de Hoon wrote:
> It may be a good idea to add a keyword allow_identical_keys (probably a
> better name is needed here), False by default, in SeqIO.parse to specify
> if SeqIO.parse should raise an exception if two records with an
> identical record.id are found. Whereas this is more of a problem when
> creating a dictionary, I think that this is also relevant in general.
I'm not very keen on this "allow_identical_keys" option for SeqIO.parse()
However, I think we could do that in the SeqIO.parse function itself
(rather than repeating the code many times for each underlying parser).
One catch is that the exception would get raised once a duplicate is
found - possibly after the user has already processed the first half of
the file.
>> Also, wouldn't this prevent us making a SeqRecord
>> inherit from Seq (another interesting idea you proposed in the past)?
>
> Not necessarily; there are two ways to avoid this:
> A) SeqRecord could inherit both from list and from Seq;
> B) Instead of letting SeqRecord inherit from list, we could add a next()
> and __iter__ method to the SeqRecord class (returning record.id and
> record, and then StopIteration); this will also let us create a
> dictionary with dict(SeqIO.parse(handle, format)).
I think I didn't make myself clear. I wanted to reserve the __iter__
method to the SeqRecord class for use like this:
for residue in record :
#assuming residue this is also a SeqRecord object
print residue.seq.tostring()
and similarly for __iter__ of a Seq class:
for residue in seq :
#assuming residue is also a Seq object,
print residue.tostring()
To me this syntax seems very natural, but does seem to block your clever
dict() plan.
Peter
More information about the Biopython-dev
mailing list