[BioPerl] Re: [Bioperl-l] Bemusement with get_seq_by_gi in a CGI script

Jason Stajich jason at cgt.duhs.duke.edu
Fri Sep 12 10:54:21 EDT 2003


So it is because the new DB::RefSeq object which is created doesn't
inherit the input params of the DB::GenBank object.

Applied a fix in CVS.

Added a method to Bio::DB::NCBIHelper which has the method
refseq_db which you can use to get/set the Bio::DB::RefSeq object.  This
should also be a little smarter/faster for repeated queries on the same
db handle since it caches the RefSeq handle.

Interestingly we have only implemented RefSeq retrieval from the EBI
server - someone should look into how to best retrieve RefSeqs from Entrez
as well and make an RefSeqEntrez interface.

setting $db->verbose(-1) will prevent the redirection
-jason
On Fri, 12 Sep 2003, Mark Wilkinson wrote:

> there is actually a similar problem somewhere else in the code.  Even if
> you use retrievaltype => 'io_string', there are cases where it will fail
> with the same symptoms.
>
> If you try to do a get_Seq_by_acc using a RefSeq identifier (e.g.
> NC_003992) you get the following warning in your errorlog:
>
> -------------------- WARNING ---------------------
> MSG: [gb|NC_003992] is not a normal sequence database but a RefSeq
> entry. Redirecting the request.
>
> Unfortunately, somewhere in that redirection something is printed to
> STDOUT because the next message is:
>
>
> ---------------------------------------------------
> [Fri Sep 12 11:29:50 2003] [error] [client 24.78.208.156] malformed
> header from script. Bad header=LOCUS       NC_003992         :
> Services.cgi
> [Fri Sep 12 11:29:50 2003] [warn] /cgi-bin/Services.cgi did not send an
> HTTP header
>
> So, this re-direction fails in a CGI environment :-(
>
> Same problem with retrievaltype => 'tempfile'
>
> M
>
>
>
> On Tue, 2003-09-09 at 13:47, Lincoln Stein wrote:
> > Sorry about any confusion this caused.  However, it is mentioned in the docs
> > for WebDBSeqI.  Perhaps the default should be changed to "tempfile", which
> > should work in all cases.
> >
> > Lincoln
> >
> > On Wednesday 20 August 2003 01:05 pm, Jason Stajich wrote:
> > > > So your script is doing what it's supposed to, it's just that some other
> > > > stuff is getting out on STDOUT before your webserver is able to get in
> > > > on the act.
> > > >
> > > > Having played a bit, this proves to be interesting:
> > > >
> > > > #!/usr/bin/perl -w
> > > > use strict;
> > > > use Bio::DB::GenBank;
> > > >
> > > > close STDOUT;
> > > >
> > > > my $d = Bio::DB::GenBank->new();
> > > > my $seq = $d -> get_Seq_by_gi('163483');
> > > >
> > > >
> > > > This gives me:
> > > >
> > > > print() on closed filehandle STDOUT at
> > > > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/WebDBSeqI.pm line 701
> > > >
> > > > So WebDBSeqI.pm is usurping STDOUT as part of its query.  This probably
> > > > explains what you're getting.  Apache will redirect STDOUT straight to
> > > > the return stream for the connection.  This means it gets the output
> > > > intended for WbDBSeq and it appears in your programs output.  You then
> > > > get the output you printed.
> > >
> > > This is part of Lincoln's rechaining of the IO and using fork - looking
> > > at his comments in the code.
> > >     # Try to create a stream using POSIX fork-and-pipe facility.
> > >     # this is a *big* win when fetching thousands of sequences from
> > >     # a web database because we can return the first entry while
> > >     # transmission is still in progress.
> > >     # Also, no need to keep sequence in memory or in a temporary file.
> > >     # If this fails (Windows, MacOS 9), we fall back to non-pipelined
> > >     # access.
> > >
> > > You can turn this off by adding  to the DB::GenBank init
> > > my $db = new Bio::DB::GenBank(-retrievaltype => 'io_string');
> > >
> > > -retrievaltype => 'io_string' (for in-memory holding of the sequence
> > >                                 before parsing)
> > >  or
> > > -retrievaltype => 'temp'      (for use of tempfiles, but I'm not 100%
> > >                                 this code has gotten a workout to cleanup
> > >                                 until the program exits which might be
> > >                                 a problem for mod_perl running scripts)
> > >
> > > > If this is right, you should have some interesting error messages in
> > > > your logs if you run your script with warnings enabled.
> > > >
> > > > I can't see an immediate fix for this, short of running your fetch as a
> > > > completely detached process with a separate STDOUT, but that kind of
> > > > defeats the point of using mod-perl.  The use of a pipe from STDOUT to
> > > > read the results of a webquery seem pretty engrained into WebQueryI.pm
> > > > and it may not be trivial to change it.
> > > >
> > > > Maybe others will be able to think of a simpler work-round?
> > > >
> > > >
> > > > Simon.
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at portal.open-bio.org
> > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > --
> > > Jason Stajich
> > > Duke University
> > > jason at cgt.mc.duke.edu
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list