[Biopython] Writing fasta+qual files and adjusting adapter clip positions in sff files
Peter Cock
p.j.a.cock at googlemail.com
Tue Apr 5 09:38:49 EDT 2011
On Tue, Apr 5, 2011 at 2:21 PM, Martin Mokrejs
<mmokrejs at fold.natur.cuni.cz> wrote:
> Hi Peter,
> I was looking into the Tutorial for a way to write fasta+qual files
> but couldn't find it.
Maybe I don't understand your question, but Bio.SeqIO.write(...)
can be used to save as FASTA or as QUAL (call it twice).
> I wanted to trim my objects assembled through
> SeqIO.QualityIO.PairedFastaQualIterator.
> Could _record[_start:_stop] be used?
If I have understood your question correctly, yes. That function will
parse a FASTA + QUAL pair and give you SeqRecord objects with
sequence and quality. You can then slice each SeqRecord to apply
trimming (underscores are not usual for variable names though).
> Anyways, I found that there is a way to convert sff files into re-trimmed
> sff files which is even closer to my goal. Here is the help text from SffIO:
>
> >>> from Bio import SeqIO
> >>> def filter_and_trim(records, primer):
> ... for record in records:
> ... if record.seq[record.annotations["clip_qual_left"]:].startswith(primer):
> ... record.annotations["clip_qual_left"] += len(primer)
> ... yield record
> >>> records = SeqIO.parse("Roche/E3MFGYR02_random_10_reads.sff", "sff")
> >>> count = SeqIO.write(filter_and_trim(records,"AAAGA"),
> ... "temp_filtered.sff", "sff")
> >>> print "Selected %i records" % count
> Selected 2 records
>
That is showing how to edit the trim values in order to write out an
updated SFF file.
>
> And this code from the Tutorial:
>
>>>> from Bio import SeqIO
>>>> SeqIO.convert("E3MFGYR02_random_10_reads.sff", "sff-trim", "trimmed.fasta", "fasta")
> 10
>>>> SeqIO.convert("E3MFGYR02_random_10_reads.sff", "sff-trim", "trimmed.qual", "qual")
> 10
>>>> SeqIO.convert("E3MFGYR02_random_10_reads.sff", "sff-trim", "trimmed.fastq", "fastq")
> 10
>
> My questions is: could I provide readnames with clip_adapter_right directly to
> SeqIO.convert()?
No, the Bio.SeqIO.convert(...) function is deliberately simple and inflexible.
> Well I will probably stick to 'sfffile -t trimpositions.txt myfile.sff'
> anyways hoping it will be faster. ;)
If you have the trim positions in a suitable text file, and you want
to apply them
to an SFF file, and you are running on Linux so you can use sfffile, then that
would work an may well be faster.
I'm a bit confused if you are trying to write out a new trimmed SFF file, a pair
of trimmed FASTA and QUAL files, or even a trimmed FASTQ file. All of
those are possible with Biopython.
Peter
More information about the Biopython
mailing list