[Biopython] Writing fasta+qual files and adjusting adapter clip positions in sff files

Peter Cock p.j.a.cock at googlemail.com
Wed Apr 6 10:26:58 EDT 2011


[Forgot to include the list, you'll get this twice Martin - sorry]

On Wed, Apr 6, 2011 at 3:17 PM, Martin Mokrejs wrote:
> Peter Cock wrote:
>> On Wed, Apr 6, 2011 at 2:54 PM, Martin Mokrejs wrote:
>>
>>> ... while for sff alterations/creations I have to stick to sfffile (which
>>> is fine for me).
>>
>> No, you can do that in Biopython too - very similar to the example you
>> quoted. You load the SFF file in, move the trim points by changing
>> the values of record.annotations["clip_qual_left"] and/or
>> record.annotations["clip_qual_right"] then save this as a new
>> SFF file. Note you need to use Python zero-based counting.
>
> And this kind of awkward as I have to sort the list of readnames with trimpoints
> to make the code efficient. Well, I haven't tried that but think the convert()
> function could take the advantage of the sff file index internally and accept the
> input in any order and just work well. Just a thought. As I wrote, I anyways use
> sfffile-based approach at the very moment.
>

Assuming your list of trim points is not in the same order as the
reads in the SFF file, then just use Bio.SeqIO.index(...) to read
the SFF file and give you random access to the reads. Then
loop over the table of trim points, extract the read, apply the
trimming, and yield the updated record. Something like this:

def trimmed_records(trim_data, indexed_sff):
    """Generator function returning SeqRecord ojbects."""
   for read_id, start, end in trim_data:
       record = indexed_sff[read_id]
       record.annotations["clip_qual_left"] = start
       record.annotations["clip_qual_left"] = end
       yield record

from Bio import SeqIO
indexed_sff = SeqIO.index("my_file.sff", "sff")
trim_data = ... #create a list or generator of 3-tuples
SeqIO.write(trimmed_records(trim_data, indexed_sff), "trimmed.sff", "sff")

In this case since you already have a tabular file of the trim data,
using the Roche tool is very sensible (and as you say, may be faster).

Peter


More information about the Biopython mailing list