[Biopython] Reading from stdin with Bio.SeqIO

Peter biopython at maubp.freeserve.co.uk
Wed Jun 17 10:51:59 EDT 2009


On Fri, Jun 5, 2009 at 12:21 PM, Peter<biopython at maubp.freeserve.co.uk> wrote:
> On Fri, Jun 5, 2009 at 11:57 AM, Giles
> Weaver<giles.weaver at googlemail.com> wrote:
>> Thanks Brad, Peter,
>>
>> I did write code almost identical to the code that Brad posted, so I was on
>> the right track, but being new to Python I'm not familiar with interpreting
>> the error messages. Foolishly, I'd neglected to check that fastq-solexa was
>> supported in my Biopython install. Having replaced Biopython 1.49 (from the
>> Ubuntu repos) with 1.50 I seem to be in business.
>
> Its great that things are working now. Can you suggest how we
> might improve the "Unknown format 'fastq-solexa'" message you
> would have seen? It could be longer and suggest checking the
> latest version of Biopython?
>
>> I did have a look at the maq documentation at
>> http://maq.sourceforge.net/fastq.shtml and tried the script at
>> http://maq.sourceforge.net/fq_all2std.pl, but found that when I piped the
>> output into bioperl I got the following errors:
>>
>> MSG: Seq/Qual descriptions don't match; using sequence description
>> MSG: Fastq sequence/quality data length mismatch error
>>
>> The good news is that using Biopython instead of fq_all2std.pl I don't get
>> the data length mismatch error.
>
> Now that you mention this, I recall trying to email Heng Li about an
> apparent bug in fq_all2std.pl where the FASTQ quality string had an
> extra letter ("!") attached. I may not have the right email address as I
> never got a reply (on this issue or regarding some missing brackets
> in the formula on http://maq.sourceforge.net/fastq.shtml in perl).

I have now forwarded the text of my original email about this possible
fq_all2std.pl bug to the MAQ users mailing list:

http://sourceforge.net/mailarchive/message.php?msg_name=320fb6e00906170708lb2ce4f7qbc5dfa43543189a2%40mail.gmail.com

>> The descriptions mismatch error I'm not worried about, as it looks
>> like its just bioperl complaining because the (apparently optional)
>> quality description doesn't exist.
>
> Good. On large files it really does make sense to omit this extra string,
> but the FASTQ format is a little nebulous with multiple interpretations.

I gather from the BioPerl mailing list that this warning about missing
(optional) repeated descriptions on the "+" lines in FASTQ files will be
removed (or perhaps already has been removed).

Peter


More information about the Biopython mailing list