[Biopython] Writing fasta+qual files and adjusting adapter clip positions in sff files
Martin Mokrejs
mmokrejs at fold.natur.cuni.cz
Wed Apr 6 13:54:41 UTC 2011
Hi Peter,
Peter Cock wrote:
> On Tue, Apr 5, 2011 at 2:21 PM, Martin Mokrejs
> <mmokrejs at fold.natur.cuni.cz> wrote:
>> Hi Peter,
>> I was looking into the Tutorial for a way to write fasta+qual files
>> but couldn't find it.
>
> Maybe I don't understand your question, but Bio.SeqIO.write(...)
> can be used to save as FASTA or as QUAL (call it twice).
Ah, I forgot, right.
>
>> I wanted to trim my objects assembled through
>> SeqIO.QualityIO.PairedFastaQualIterator.
>> Could _record[_start:_stop] be used?
>
> If I have understood your question correctly, yes. That function will
> parse a FASTA + QUAL pair and give you SeqRecord objects with
> sequence and quality. You can then slice each SeqRecord to apply
> trimming (underscores are not usual for variable names though).
OK, I just wasn't sure if the slicing works this way and was a bit lazy
to test myself to yield a new object with shorter sequences and qual values.
>
>> Anyways, I found that there is a way to convert sff files into re-trimmed
>> sff files which is even closer to my goal. Here is the help text from SffIO:
>>
>> >>> from Bio import SeqIO
>> >>> def filter_and_trim(records, primer):
>> ... for record in records:
>> ... if record.seq[record.annotations["clip_qual_left"]:].startswith(primer):
>> ... record.annotations["clip_qual_left"] += len(primer)
>> ... yield record
>> >>> records = SeqIO.parse("Roche/E3MFGYR02_random_10_reads.sff", "sff")
>> >>> count = SeqIO.write(filter_and_trim(records,"AAAGA"),
>> ... "temp_filtered.sff", "sff")
>> >>> print "Selected %i records" % count
>> Selected 2 records
>>
>
> That is showing how to edit the trim values in order to write out an
> updated SFF file.
Yes.
>
>>
>> And this code from the Tutorial:
>>
>>>>> from Bio import SeqIO
>>>>> SeqIO.convert("E3MFGYR02_random_10_reads.sff", "sff-trim", "trimmed.fasta", "fasta")
>> 10
>>>>> SeqIO.convert("E3MFGYR02_random_10_reads.sff", "sff-trim", "trimmed.qual", "qual")
>> 10
>>>>> SeqIO.convert("E3MFGYR02_random_10_reads.sff", "sff-trim", "trimmed.fastq", "fastq")
>> 10
>>
>> My questions is: could I provide readnames with clip_adapter_right directly to
>> SeqIO.convert()?
>
> No, the Bio.SeqIO.convert(...) function is deliberately simple and inflexible.
Pity. Probably because the sff file can be indexed it would be fast if I provide
the function with a handle to (even in unsorted order):
GQF67IL01D9394 5-89
GQF67IL01AM9KN 5-87
GQF67IL01BIDWF 5-135
GQF67IL01D5PMS 5-97
GQF67IL01AONRB 5-60
GQF67IL01BNA85 5-0
>
>> Well I will probably stick to 'sfffile -t trimpositions.txt myfile.sff'
>> anyways hoping it will be faster. ;)
>
> If you have the trim positions in a suitable text file, and you want
> to apply them
> to an SFF file, and you are running on Linux so you can use sfffile, then that
> would work an may well be faster.
Yes, sfffile works for me while I probably see some bug in it (have to re-test
to be sure).
>
> I'm a bit confused if you are trying to write out a new trimmed SFF file, a pair
> of trimmed FASTA and QUAL files, or even a trimmed FASTQ file. All of
> those are possible with Biopython.
I wanted either trimmed fasta+qual or trimmed sff (preferably) both with my _new_
trim points. From the above it is now clear for fasta+qual it can be done through
biopython while for sff alterations/creations I have to stick to sfffile (which
is fine for me).
Thanks,
Martin
More information about the Biopython
mailing list