[Bioperl-l] genpept/swiss

Andrew Dalke dalke@acm.org
Mon, 4 Sep 2000 17:44:34 -0600


Kris Boulez <krbou@pgsgent.be>:
>While at that moment I want to
>be able to say that he can ignore the warning if (e.g.) he has read the
>sequence correctly.

With Martel it means making a new parser, but I've tried to make it
easy to describe new ones.

For example, to read only the sequence fields in EMBL
you might use the following, untested code:

not_sequence_header = Re("(?!ORIGIN).*)\n")
sequence_header = Str("ORIGIN\n")
sequence_line = Re(" *\d+ (?P<sequence>.*)\n")
record_end = Str("//\n")

record = Rep(not_sequence_header) + \
         sequence_header + \
         Group("sequence_block", Rep(sequence_line)) + \
         record_end

parser = record.make_parser()
 ...

In other words, this reads lines until it finds the "ORIGIN" line, then
the next lines (up to the "//\n") are read in as sequence data.

                    Andrew



Received: from Lists.Uni-Bielefeld.DE (IDENT:0@pan.hrz.uni-bielefeld.de [129.70.4.30])
	by pw600a.bioperl.org (8.9.3/8.9.3) with ESMTP id JAA01248
	for <bioperl-l@bioperl.org>; Fri, 1 Sep 2000 09:19:15 -0400
Received: from jess.sanger.ac.uk (root@jess.sanger.ac.uk [193.60.84.61])
	by Lists.Uni-Bielefeld.DE (8.8.6 (PHNE_17135)/8.8.6) with ESMTP id QAA17401
	for <vsns-bcd-perl@lists.uni-bielefeld.de>; Fri, 1 Sep 2000 16:23:12 +0200 (METDST)
Received: from caldy.sanger.ac.uk (root@caldy [193.60.84.232])
	by jess.sanger.ac.uk (8.8.8/8.8.7) with ESMTP id PAA32373
	for <vsns-bcd-perl@lists.uni-bielefeld.de>; Fri, 1 Sep 2000 15:23:10 +0100 (BST)
Received: from localhost (jgrg@localhost [127.0.0.1])
	by caldy.sanger.ac.uk (8.8.7/8.8.7) with ESMTP id PAA21444
	for <vsns-bcd-perl@lists.uni-bielefeld.de>; Fri, 1 Sep 2000 15:23:10 +0100 (BST)
Date: Fri, 1 Sep 2000 15:23:10 +0100 (BST)
From: James Gilbert <jgrg@sanger.ac.uk>
To: Bioperl <vsns-bcd-perl@lists.uni-bielefeld.de>
Message-ID: <Pine.OSF.4.21.0009011519250.20887-100000@caldy.sanger.ac.uk>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Subject: [Bioperl-l] minor EMBL and GenBank fixes
Sender: bioperl-l-admin@bioperl.org
Errors-To: bioperl-l-admin@bioperl.org
X-BeenThere: bioperl-l@bioperl.org
X-Mailman-Version: 2.0beta2
Precedence: bulk
List-Id: Bioperl Project Discussion List <bioperl-l.bioperl.org>


I've fixed SeqIO::embl and SeqIO::genbank, where
the accession_number method was returning
"unknown".  This was becasue accession_number was
being added to the Bio::Seq object during the
header parsing loop.  accession is actually stored
in the Bio::PrimarySeq object, but a new
Bio::PrimarySeq object was being made after the
sequence (unnecessary, because it is made along
with the new Bio::Seq object).

	James

James G.R. Gilbert
The Sanger Centre
Wellcome Trust Genome Campus
Hinxton
Cambridge                        Tel: 01223 494906
CB10 1SA                         Fax: 01223 494919


Received: from Lists.Uni-Bielefeld.DE (IDENT:0@pan.hrz.uni-bielefeld.de [129.70.4.30])
	by pw600a.bioperl.org (8.9.3/8.9.3) with ESMTP id BAA01971
	for <bioperl-l@bioperl.org>; Thu, 31 Aug 2000 01:53:28 -0400
Received: from gatekeeper.genelogic.com (smtp.genelogic.com [38.227.237.2])
	by Lists.Uni-Bielefeld.DE (8.8.6 (PHNE_17135)/8.8.6) with SMTP id IAA02500
	for <vsns-bcd-perl@lists.uni-bielefeld.de>; Thu, 31 Aug 2000 08:57:24 +0200 (METDST)
Received: by gatekeeper.genelogic.com; (5.65v3.2/1.3/10May95) id AA10562; Thu, 31 Aug 2000 02:57:22 -0400
Received: from genelogic.com ([10.2.1.165]) by othello.genelogic.com
          (Netscape Messaging Server 3.5)  with ESMTP id AAA13D8AA
          for <vsns-bcd-perl@lists.uni-bielefeld.de>;
          Thu, 31 Aug 2000 02:57:19 -0400
Message-Id: <39AE0221.5892184C@genelogic.com>
Date: Wed, 30 Aug 2000 23:58:41 -0700
From: "Mike Cariaso" <mcariaso@genelogic.com>
Organization: Gene Logic
X-Mailer: Mozilla 4.73 [en] (WinNT; U)
X-Accept-Language: en
Mime-Version: 1.0
Cc: vsns-bcd-perl@lists.uni-bielefeld.de
References: <Pine.LNX.4.10.10003031640320.11827-100000@gene.pbi.nrc.ca>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Subject: [Bioperl-l] Bio::DB::Genbank && Firewalls
Sender: bioperl-l-admin@bioperl.org
Errors-To: bioperl-l-admin@bioperl.org
X-BeenThere: bioperl-l@bioperl.org
X-Mailman-Version: 2.0beta2
Precedence: bulk
List-Id: Bioperl Project Discussion List <bioperl-l.bioperl.org>

As reported Bio::DB::GenBank and Bio::DB::GenPept do not work through firewalls.
I've put together some code to make it work, but my code uses LWP::UserAgent
and  HTTP::Request. Also I see (after this was done) that AJM has submitted a
non-LWP patch.


It doesn't yet have any error checking, and I'd eventually like to set it up so
that its only used if there is a firewall or a flag set, to avoid introducing
another external library dependency, but I was hoping others might look it over,
and perhaps explain the right way to prepare a patch. I'm familiar with diff
already, and I know there is lots of little things I could do to contribute, but
the first submission is a bit of a hurdle for me.




The fix is a drop in replacement for the subroutine _get_stream in
Bio::DB::GenBank.pm.


sub _get_stream {
  my($self, $entrez) = @_;

  my $ua = LWP::UserAgent->new();
  $ua->env_proxy;

  my $req = HTTP::Request->new(GET =>
'http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?'.$entrez);
  my $res = $ua->request($req);
  my $fh = IO::File->new_tmpfile();

  my $content =  $res->content;
  $content =~ s/^.*?-----+\n//s;
  $fh->print($content);
  $fh->flush;
  $fh->reset;

  return Bio::SeqIO->new('-fh' => $fh, '-format' => 'Fasta');
}


You'll need to add

use LWP::UserAgent;
use HTTP::Request;
use IO::File;

somewhere near the other 'use'  statements at the top.


The exact same code will also need to be dropped in for GenPept. Which shows a
strong need to move this into some base class or common lib, but that can wait
for now.

Lastly, the POD for UserAgent can explain how to use the firewall in more
detail, but all you have to do is

setenv http_proxy http://proxy.domain.com:/8080/

ow whatever is appropriate for your shell and network config.







--
mike cariaso      ----------------     mcariaso@genelogic.com
ph:510-981-3156 ---------------------------- fax:510-649-3449
----------------- fear darkness, not fire -------------------