[Biopython] internal function to convert illumina quality scores to phred

Brad Chapman chapmanb at 50mail.com
Tue Feb 1 16:03:04 UTC 2011


Alan and Peter;
Alan, nice suggestions on conversion from phred. On the barcode
sorting side there was just some discussion of this on the
development list; I have a script that does barcode sorting
and trimming with mismatches using Biopython:

https://github.com/chapmanb/bcbb/blob/master/nextgen/scripts/barcode_sort_trim.py

It does not use qualities, but this might be a framework you could
build off to add that support.

Peter, how hard do you think it would be to have SeqIO only convert
from the fastq encoding to phred scores on demand? Most of the time
when dealing with fastq I do not need any conversion at all and use
the FastqGeneralIterator to just pull out the name, sequence and
quality. 

You've done a lot of nice work with the correct conversions and it
would be great to expose that directly though on-demand conversion
as Alan is suggesting. Ideally you would use SeqIO as normal with
fastq files, but the quality score would not be converted to solexa
during parsing using letter_annotations["solexa_quality"] was
accessed.

Another option would just be to expose a function so folks
could do:

convert_fastq_illumina_to_quality(illumina_encoded_string)

to get the phred quality scores for a string they were interested
in. This way you could use FastqGeneralIterator for no SeqRecord/Seq
overhead, but still make use of your conversion work.

Brad



More information about the Biopython mailing list