[Biopython] internal function to convert illumina quality scores to phred
Alan Bergland
bergland at stanford.edu
Mon Jan 31 19:54:23 UTC 2011
Hi Peter,
Thanks for the quick reply. So, I am trying to iterate through two
large fastq files (each file is one paired-end read) and split the
reads by one of 9 barcodes found on both 5' ends of each read. I
would like to use the quality information for those barcode reads to
assess which barcode-group they belong to.
I think it would be nice to use FastqGeneralIterator because I don't
need to translate the quality scores for the full read (100bp) back
and forth while I iterate through the file. I gather that when I use
SeqIO.parse and SeqIO.write, the quality scores are converted back and
forth. There is no need to do this for the whole read.
I've written a little snippet of code that simply prints the quality
scores from the barcodes:
from Bio import SeqIO
from Bio.SeqIO.QualityIO import *
from Bio.SeqIO import *
pe1 = open("head2_pe1.fastq", "r")
pe2 = open("head2_pe2.fastq", "r")
pe1_record_it = FastqGeneralIterator(pe1)
for pe1_seq_record in pe1_record_it:
bc = SeqRecord(Seq(pe1_seq_record[1][:6]), id="a")
bc.letter_annotations['fastq-illumina'] = pe1_seq_record[2][:6]
print bc.letter_annotations["fastq-illumina"]
this just prints out the illumina encoded quality scores. How would I
print out the phred scores instead?
Thanks,
Alan
On Jan 31, 2011, at 11:36 AM, Peter wrote:
> On Mon, Jan 31, 2011 at 6:50 PM, Alan Bergland
> <bergland at stanford.edu> wrote:
>> Hi all,
>>
>> I am trying to convert some code I've written to use
>> FastqGeneralIterator rather than SeqIO.parse. For the most part,
>> it works
>> great and there is a big speed improvement. However, I need to be
>> able to
>> convert the quality scores of 6 characters from the Illumina format
>> to
>> phred. I can't seem to find the function to do this. I'm sure it
>> must
>> exist, and I apologize if documentation for it is sitting right
>> there in the
>> tutorial - I can't seem to find it. Can someone point me in the
>> right
>> direction?
>>
>> Cheers,
>> Alan
>
> Hi Alan,
>
> Probably something in Bio.SeqIO.QualityIO will do what you want,
> consult the module's built in documentation via help(...) in Python
> or the online version which is here:
> http://www.biopython.org/DIST/docs/api/Bio.SeqIO.QualityIO-module.html
>
> I could be more precise if you could clarify what exactly it is you
> want to do with a couple of examples (input, desired output).
>
> If you just want fast Solexa/Illumina FASTQ to Sanger FASTQ or a
> PHRED style QUAL file from within Python use Bio.SeqIO.convert
> for this.
>
> Peter
More information about the Biopython
mailing list