[Biopython] internal function to convert illumina quality scores to phred

Alan Bergland bergland at stanford.edu
Mon Jan 31 14:54:23 EST 2011


Hi Peter,

	Thanks for the quick reply.  So, I am trying to iterate through two  
large fastq files (each file is one paired-end read) and split the  
reads by one of 9 barcodes found on both 5' ends of each read.  I  
would like to use the quality information for those barcode reads to  
assess which barcode-group they belong to.

	I think it would be nice to use FastqGeneralIterator because I don't  
need to translate the quality scores for the full read (100bp) back  
and forth while I iterate through the file.  I gather that when I use  
SeqIO.parse and SeqIO.write, the quality scores are converted back and  
forth.  There is no need to do this for the whole read.
	
	I've written a little snippet of code that simply prints the quality  
scores from the barcodes:

from Bio import SeqIO
from Bio.SeqIO.QualityIO import *
from Bio.SeqIO import *

pe1 = open("head2_pe1.fastq", "r")
pe2 = open("head2_pe2.fastq", "r")

pe1_record_it = FastqGeneralIterator(pe1)

for pe1_seq_record in pe1_record_it:
     bc = SeqRecord(Seq(pe1_seq_record[1][:6]), id="a")
     bc.letter_annotations['fastq-illumina'] = pe1_seq_record[2][:6]

     print bc.letter_annotations["fastq-illumina"]

this just prints out the illumina encoded quality scores.  How would I  
print out the phred scores instead?

Thanks,
Alan


	
On Jan 31, 2011, at 11:36 AM, Peter wrote:

> On Mon, Jan 31, 2011 at 6:50 PM, Alan Bergland  
> <bergland at stanford.edu> wrote:
>> Hi all,
>>
>>        I am trying to convert some code I've written to use
>> FastqGeneralIterator rather than SeqIO.parse.  For the most part,  
>> it works
>> great and there is a big speed improvement.  However, I need to be  
>> able to
>> convert the quality scores of 6 characters from the Illumina format  
>> to
>> phred.  I can't seem to find the function to do this.  I'm sure it  
>> must
>> exist, and I apologize if documentation for it is sitting right  
>> there in the
>> tutorial - I can't seem to find it.  Can someone point me in the  
>> right
>> direction?
>>
>> Cheers,
>> Alan
>
> Hi Alan,
>
> Probably something in Bio.SeqIO.QualityIO will do what you want,
> consult the module's built in documentation via help(...) in Python
> or the online version which is here:
> http://www.biopython.org/DIST/docs/api/Bio.SeqIO.QualityIO-module.html
>
> I could be more precise if you could clarify what exactly it is you
> want to do with a couple of examples (input, desired output).
>
> If you just want fast Solexa/Illumina FASTQ to Sanger FASTQ or a
> PHRED style QUAL file from within Python use Bio.SeqIO.convert
> for this.
>
> Peter



More information about the Biopython mailing list