[Biopython-dev] SeqIO and qual: Question about reading and writing qual files

Wed Mar 25 19:30:14 EDT 2009

On Wed, Mar 25, 2009 at 11:15 PM, Sebastian Bassi:
>> Sebastian - could you have a quick play with this github code (using the new
>> UnknownSeq class), and the current CVS code (using None), and make sure
>> both support the slicing operations you were trying earlier?  Thanks.
>
> First I tried the CVS code (with None in seq), it worked.

OK, good.  That will do in the very short term - the UnknownSeq needs
some more testing and general approval before I'd check that in.

> Then I tried the git code and it also worked. One thing I noticed is
> that I got "?" instead of "N" the "sequence" of the UnknownSeq.

I felt we shouldn't use an "N" unless we are confident the sequence
is nucleotides.  In practice, this is probably a safe assumption for
FASTQ and QUAL files - unless anyone can think of a counter example?
Do you think it is safe to assume FASTQ and QUAL files are just for
nucleotides?

I mean, you could translate a CDS from transcriptome sequencing,
and for the sake of argument give each amino acid a quality score
from the three nucleotide quality scores, and then save this a protein
FASTQ file.  But I've never heard of anyone actually doing this ;)

> From a practical point of view, both versions are the same, but the
> concept of UnknownSeq looks solid than None, because if I don't know
> about about biopython internals, I would never try to slice a None
> seq. With "None":
> len(s) returns:
>
> Traceback (most recent call last):
> ...
> TypeError: object of type 'NoneType' has no len()
>
> So I would never try to do:
> new_s = s[10:30]
>
> But with the UnknownSeq object, len(s) returns an actual length, so it
> is more intuitive that it can be sliced.

I agree the UnknownSeq is more intuitive - plus it makes the SeqRecord
__getitem__ code nicer, and it means you can do len(SeqRecord) too,
which was problematic if the sequence was None.

>
> I liked the github interface, may I setup my own repository?
>

Yes - this is one of the nice things about git, it makes it easy for anyone
to make their own local branch of Biopython, but keep it under version
control and pull in changes from the master branch (or another git user)
quite easily.  It should also make it easy to offer changes back to the
main project (assuming we do switch to hosting it on git, for now it is
still being done via CVS).  However, bear in mind this is still only a test
migration, and it is still possible we'll have to redo the CVS to git
migration.  There is a long (and on going) thread on this mailing list
about all this already, with an evolving wiki page:
http://biopython.org/wiki/GitMigration

Peter