[Biopython] Reading from stdin with Bio.SeqIO

Giles Weaver giles.weaver at googlemail.com
Fri Jun 5 10:57:41 UTC 2009


Thanks Brad, Peter,

I did write code almost identical to the code that Brad posted, so I was on
the right track, but being new to Python I'm not familiar with interpreting
the error messages. Foolishly, I'd neglected to check that fastq-solexa was
supported in my Biopython install. Having replaced Biopython 1.49 (from the
Ubuntu repos) with 1.50 I seem to be in business.

I did have a look at the maq documentation at
http://maq.sourceforge.net/fastq.shtml and tried the script at
http://maq.sourceforge.net/fq_all2std.pl, but found that when I piped the
output into bioperl I got the following errors:

MSG: Seq/Qual descriptions don't match; using sequence description
MSG: Fastq sequence/quality data length mismatch error

The good news is that using Biopython instead of fq_all2std.pl I don't get
the data length mismatch error. The descriptions mismatch error I'm not
worried about, as it looks like its just bioperl complaining because the
(apparently optional) quality description doesn't exist.

There is a recent thread on the bioperl mailing lists where Heikki
Lehvaslaiho has written a very detailed post (
http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030017.html) on the
peculiarities of sanger/solexa/illumina quality encoding. Evidently there
are a lot of pitfalls for the unwary, and there may be issues with the maq
implementation. If the maq script was used as a reference for the biopython
version you may want to check that the same issues haven't been replicated
in biopython.

Thanks again for the help.

Giles


2009/6/4 Peter <biopython at maubp.freeserve.co.uk>

> On Thu, Jun 4, 2009 at 5:04 PM, Giles Weaver
> <giles.weaver at googlemail.com> wrote:
> > Hi,
> >
> > I'm new to biopython, having used bioperl and biosql for some time. I
> need
> > to convert a solexa format fastq file into a sanger format fastq file.
> This
> > isn't yet possible in bioperl as there isn't a bioperl parser for solexa
> > fastq yet, so I thought I'd give biopython a go.
> >
> > I want to right the biopython equivalent of the following:
> > ...
> > Unfortunately I can't find any documentation on how to read from or write
> to
> > Unix pipes with Bio.SeqIO.
> > Can anyone help?
>
> Brad has kindly posted a solution - four lines of python code for the
> whole script (but with the format names hard coded).
>
> Our tutorial does try and emphasise that Bio.SeqIO works with handles,
> which can be open files (as in most of the examples), internet
> connections, output from command lines (as in some of our example), or
> indeed the standard input/output pipes for the python script itself
> (if run at the command line). I hadn't considered including an example
> of this in the main tutorial on the grounds it would probably only of
> interest to people already familiar with the Unix command line. But
> Brad is right, this would make a nice wiki cookbook entry.
>
> Peter
>
> P.S. If you do want a perl solution, there is a script included with
> maq which I found quite handy as a reference implementation for
> Biopython.
> http://maq.sourceforge.net/fq_all2std.pl
>



More information about the Biopython mailing list