[Biopython-dev] [Biopython] when is a SeqRecord not a SeqRecord
Brad Chapman
chapmanb at 50mail.com
Wed Jul 18 18:29:36 UTC 2012
Dilara;
Apologies, I missed that the second mail had updated code.
> This works as you pointed out because filtered_rec is explicitly defined.
> Now if I want to do this
>
> from Bio import SeqIO
> mod = (check_meanQ(rec, q_thresholdd) for rec in
> SeqIO.parse("hiseq_pe_test.fastq", "fastq"))
> count = SeqIO.write(mod, "filtered_hiseq_pe_test.fastq", "fastq")
> print "Modified %i records" %count
>
> It doesn't work because of some of the records are None. So I tried doing
> this
The approach I'd take it to clean up check_meanQ to be explicit about
the return values:
> def check_meanQ(rec, q_threshold):
> seqlen=len(rec)
> quality_scores=array(rec.letter_annotations["phred_quality"])
> if round(quality_scores.mean()) <= q_threshold:
> print "Discarded ", rec.id, "because mean Q was",
> round(quality_scores.mean())
> badrec = None
> if round(quality_scores.mean()) > q_threshold:
> goodrec = rec
>
> return goodrec
def check_meanQ(rec, q_threshold):
quality_scores=array(rec.letter_annotations["phred_quality"])
if round(quality_scores.mean()) <= q_threshold:
print "Discarded ", rec.id, "because mean Q was", round(quality_scores.mean())
return None
else:
return rec
Then explicitly check for None values and remove them when writing:
> count = SeqIO.write(mod, "filtered_hiseq_pe_test.fastq", "fastq")
count = SeqIO.write((x for x in mod if x is not None),
"filtered_hiseq_pe_test.fastq", "fastq")
Hope this helps,
Brad
More information about the Biopython-dev
mailing list