[BioPerl] Re: [Bioperl-l] Bemusement with get_seq_by_gi in a CGI script

Fri Sep 12 10:35:44 EDT 2003

there is actually a similar problem somewhere else in the code.  Even if
you use retrievaltype => 'io_string', there are cases where it will fail
with the same symptoms.

If you try to do a get_Seq_by_acc using a RefSeq identifier (e.g.
NC_003992) you get the following warning in your errorlog:

-------------------- WARNING ---------------------
MSG: [gb|NC_003992] is not a normal sequence database but a RefSeq
entry. Redirecting the request.

Unfortunately, somewhere in that redirection something is printed to
STDOUT because the next message is:

---------------------------------------------------
[Fri Sep 12 11:29:50 2003] [error] [client 24.78.208.156] malformed
header from script. Bad header=LOCUS       NC_003992         :
Services.cgi
[Fri Sep 12 11:29:50 2003] [warn] /cgi-bin/Services.cgi did not send an
HTTP header

So, this re-direction fails in a CGI environment :-(

Same problem with retrievaltype => 'tempfile'

M

On Tue, 2003-09-09 at 13:47, Lincoln Stein wrote:
> Sorry about any confusion this caused.  However, it is mentioned in the docs 
> for WebDBSeqI.  Perhaps the default should be changed to "tempfile", which 
> should work in all cases.
> 
> Lincoln
> 
> On Wednesday 20 August 2003 01:05 pm, Jason Stajich wrote:
> > > So your script is doing what it's supposed to, it's just that some other
> > > stuff is getting out on STDOUT before your webserver is able to get in
> > > on the act.
> > >
> > > Having played a bit, this proves to be interesting:
> > >
> > > #!/usr/bin/perl -w
> > > use strict;
> > > use Bio::DB::GenBank;
> > >
> > > close STDOUT;
> > >
> > > my $d = Bio::DB::GenBank->new();
> > > my $seq = $d -> get_Seq_by_gi('163483');
> > >
> > >
> > > This gives me:
> > >
> > > print() on closed filehandle STDOUT at
> > > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/WebDBSeqI.pm line 701
> > >
> > > So WebDBSeqI.pm is usurping STDOUT as part of its query.  This probably
> > > explains what you're getting.  Apache will redirect STDOUT straight to
> > > the return stream for the connection.  This means it gets the output
> > > intended for WbDBSeq and it appears in your programs output.  You then
> > > get the output you printed.
> >
> > This is part of Lincoln's rechaining of the IO and using fork - looking
> > at his comments in the code.
> >     # Try to create a stream using POSIX fork-and-pipe facility.
> >     # this is a *big* win when fetching thousands of sequences from
> >     # a web database because we can return the first entry while
> >     # transmission is still in progress.
> >     # Also, no need to keep sequence in memory or in a temporary file.
> >     # If this fails (Windows, MacOS 9), we fall back to non-pipelined
> >     # access.
> >
> > You can turn this off by adding  to the DB::GenBank init
> > my $db = new Bio::DB::GenBank(-retrievaltype => 'io_string');
> >
> > -retrievaltype => 'io_string' (for in-memory holding of the sequence
> >                                 before parsing)
> >  or
> > -retrievaltype => 'temp'      (for use of tempfiles, but I'm not 100%
> >                                 this code has gotten a workout to cleanup
> >                                 until the program exits which might be
> >                                 a problem for mod_perl running scripts)
> >
> > > If this is right, you should have some interesting error messages in
> > > your logs if you run your script with warnings enabled.
> > >
> > > I can't see an immediate fix for this, short of running your fetch as a
> > > completely detached process with a separate STDOUT, but that kind of
> > > defeats the point of using mod-perl.  The use of a pipe from STDOUT to
> > > read the results of a webquery seem pretty engrained into WebQueryI.pm
> > > and it may not be trivial to change it.
> > >
> > > Maybe others will be able to think of a simpler work-round?
> > >
> > >
> > > Simon.
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > Jason Stajich
> > Duke University
> > jason at cgt.mc.duke.edu
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
-- 
Mark Wilkinson <markw at illuminae.com>
Illuminae