[Bioperl-l] Problems when retrieve sequences from GenBank

Caio Freire freire at ime.usp.br
Tue Oct 23 11:33:33 UTC 2012


Hi BioPerl users,

A couple of year ago, I wrote a script to retrieve sequences from GenBank,
using Bio::DB::GenBank module and it worked very well in my machine on
Ubuntu 10.04. Now, I'm having some problems to do this job on Ubuntu 12.04,
since my script returns a warning like this "MSG: Unrecognized DBSOURCE
data: BioProject: PRJNA37833". I don't find an obvious solution in Google.
Could anyone help me? My script is in the bottom of this mail.



Sometimes the message is:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: WebDBSeqI Request Error:
HTTP/1.1 502 Bad Gateway
Connection: close
Date: Tue, 23 Oct 2012 11:08:47 GMT
Accept-Ranges: bytes
Server: Apache
Vary: accept-language,accept-charset
Content-Language: en
Content-Type: text/html; charset=iso-8859-1
Client-Date: Tue, 23 Oct 2012 11:09:25 GMT
Client-Peer: 165.112.7.20:80
Client-Response-Num: 1
Client-Transfer-Encoding: chunked
Link: <mailto:info at ncbi.nlm.nih.gov>; rev="made"
Title: Bad Gateway!

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<title>Bad Gateway!</title>
<link rev="made" href="mailto:info at ncbi.nlm.nih.gov" />
<style type="text/css"><!--/*--><![CDATA[/*><!--*/
    body { color: #000000; background-color: #FFFFFF; }
    a:link { color: #0000CC; }
    p, address {margin-left: 3em;}
    span {font-size: smaller;}
/*]]>*/--></style>
</head>

<body>
<h1>Bad Gateway!</h1>
<p>


    The proxy server received an invalid
    response from an upstream server.


    </p>
<p>

    The proxy server could not handle the request <em><a
href="/entrez/eutils/efetch.fcgi">GET&nbsp;/entrez/eutils/efetch.fcgi</a></em>.<p>
Reason: <strong>Error reading from remote server</strong></p>


</p>
<p>
If you think this is a server error, please contact
the <a href="mailto:info at ncbi.nlm.nih.gov">webmaster</a>.

</p>

<h2>Error 502</h2>
<address>
  <a href="/">eutils.ncbi.nlm.nih.gov</a><br />

  <span>Tue Oct 23 07:08:47 2012<br />
  Apache</span>
</address>
</body>
</html>


STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:472
STACK: Bio::DB::WebDBSeqI::_stream_request
/usr/share/perl5/Bio/DB/WebDBSeqI.pm:773
STACK: Bio::DB::WebDBSeqI::get_seq_stream
/usr/share/perl5/Bio/DB/WebDBSeqI.pm:467
STACK: Bio::DB::WebDBSeqI::get_Stream_by_id
/usr/share/perl5/Bio/DB/WebDBSeqI.pm:288
STACK: Bio::DB::WebDBSeqI::get_Seq_by_id
/usr/share/perl5/Bio/DB/WebDBSeqI.pm:158
STACK: check.pl:28
-----------------------------------------------------------

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: id does not exist
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:472
STACK: Bio::DB::WebDBSeqI::get_Seq_by_id
/usr/share/perl5/Bio/DB/WebDBSeqI.pm:167
STACK: check.pl:28
-----------------------------------------------------------




My script:

#!/usr/bin/perl -w

use Bio::DB::GenBank;

$file = shift @ARGV;
open FH, "<$file";
@content = <FH>;
$cont1=0;
foreach $line (@content){
if ($line =~ /^>.+/g){
 $name = $line;
 $name =~ s/>//g;
 push (@names, $name);
 $seq = '';
$cont++;
}#close if
elsif($line !~ /^>/){
 $seq .= $line;
 $seq =~ s/\s//g;
 $seq = uc($seq);
}#close elsif
$hash1{$name} = $seq;
}

######################################################
$db_obj = Bio::DB::GenBank->new;
foreach $name(@names){
@split = split (/_/, $name);
$ident = $split[-1];
$Bio::Seq::seq_obj = $db_obj->get_Seq_by_id($ident);
$GBsequencia = $Bio::Seq::seq_obj->seq();
$GBsequencia =~ s/\n+//g;
$seq = $hash1{$name};
if ($GBsequencia ne $seq){
$cont1++;
print "Problems: $name\n"
}
}
print "Total: $cont\tProblems: $cont1\n";





 Best,
===============================================================
Caio César de Melo Freire, BSc Biomedicine

PhD candidate - Bioinformatics

Laboratory of Molecular Evolution and Bioinformatics
Institute of Biomedical Sciences - II

University of Sao Paulo
+551130918453
Av. Prof. Lineu Prestes, 1374 - Cidade Universitária "Armando Salles
Oliveira", Butantã - São Paulo - SP - CEP 05508-900
================================================================




More information about the Bioperl-l mailing list