[Bioperl-l] Are there arguments for REGION of ACCESSION in Bio::DB

Tue Mar 13 16:41:36 UTC 2012

Hi,

I get the same error as you, although I should also note that I'm not 
familiar with this module, so I may be missing a problem with the HowTo 
code. Also (like you) I have an old version of BioPerl installed, so 
perhaps you could try upgrading your BioPerl to the most recent version 
(1.6.901) from CPAN or bioperl-live from GitHub? There have probably 
been modifications to Bio::DB::GenBank since 1.6.1.

One thing I noticed - the accession numbers you quote are from RefSeq, 
not GenBank (the NCBI make the two difficult to distinguish in Entrez, 
but RefSeq accessions contain an underscore). I tried replacing 
Bio::DB::GenBank with Bio::DB::RefSeq and that seemed to work - 
according to the docs the RefSeq module downloads from the EBI rather 
than the NCBI.

Cheers,
Roy.

On 13/03/2012 08:30, yun YAN wrote:
> Dear Roy,
> Great thanks for your reply. And I try it as soon as I receive your
> mail. However, it reports an error:
>
>     MSG: acc NM_000344 does not exist
>     STACK: Error::throw
>     STACK: Bio::Root::Root::throw
>     /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:368
>     STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
>     /usr/local/share/perl/5.10.1/Bio/DB/WebDBSeqI.pm:195
>     STACK: test_gene_bank_with_sublocation.pl:26
>     <http://test_gene_bank_with_sublocation.pl:26>
>
> I've repeatedly checked my codes, and still cannot figure out where is
> the bug. At first I think maybe it does not support genome assembly
> (NC_000005), thus I try SMN1 gene directly ( NM_000344). Neither of them
> works. Even the simplest codes still report the error: "acc NM_000344
> does not exist", while the accession number does exists,
> http://www.ncbi.nlm.nih.gov/nuccore/NM_000344.3.
>   My test code is (almost exactly copied from HOWTO tutorial) :
>
>     use strict;
>     use warnings;
>     use Bio::DB::GenBank;
>     my $gb = Bio::DB::GenBank->new (-format => 'genbank', -seq_start =>
>     1, -seq_stop => 2000, -strand =>1,);
>     my $seq_obj = $gb->get_Seq_by_acc('NM_000344');
>     print $seq_obj; #just for test
>
> Currently my perl is 5.10.1, and BioPerl stays in 1.6.1. All codes run
> on Ubuntu 10.04 LTS. I've checked Bio::DB::GenBank module of 1.6.1
> version, and it supports -seq_start and -seq_stop function.
> Any ideas? Hope I don't make some low-level mistakes. Look forward to
> your reply.
> Thanks.
>
> On Mon, Mar 12, 2012 at 8:38 PM, Roy Chaudhuri <roy.chaudhuri at gmail.com
> <mailto:roy.chaudhuri at gmail.com>> wrote:
>
>     I think this is what you want:
>     http://www.bioperl.org/wiki/__HOWTO:Getting_Genomic___Sequences#Using_Bio::DB::__GenBank_when_you_have_genomic___coordinates_to_get_a_Seq___object
>     <http://www.bioperl.org/wiki/HOWTO:Getting_Genomic_Sequences#Using_Bio::DB::GenBank_when_you_have_genomic_coordinates_to_get_a_Seq_object>
>
>
>     On 12/03/2012 05:33, yun YAN wrote:
>
>         One's goal is to get both exon/intron region of gene of interest
>         from
>         remote database(NCBI), with the help of Bio::DB::GenBank.
>         "get_seq_by_acc"
>         will work for most cases, but it seems that it cannot be used for
>         exon/intron parsing.
>
>         Let's say gene SMN1,
>         http://www.ncbi.nlm.nih.gov/__nuccore/NC_000005.9?report=__genbank&from=70220768&to=__70248839
>         <http://www.ncbi.nlm.nih.gov/nuccore/NC_000005.9?report=genbank&from=70220768&to=70248839>
>           .
>         The exon/inron information can only be available in genome
>         assembly part,
>         and the accession number (
>         NC_000005<http://www.ncbi.nlm.__nih.gov/nuccore/NC_000005
>         <http://www.ncbi.nlm.nih.gov/nuccore/NC_000005>>) is
>
>         actually the genome contig, not gene. To define my gene SMN1, an
>         additional
>         argument "REGION" is needed (REGION: 70220768..70248839). If I
>         use simply
>         "get_seq_by_acc", it will not return the gene, but return the genome
>         assembly results.
>
>         Thus any ideas about how to retrieve the gene (not mRNA)
>         containing both
>         exon/intron? Are there any additional arguments in
>         get_by_acc('XXXX')
>         REGION( 1234..6789), perhaps?
>
>         I want to use command-line as much as possible. I used to copy
>         out the page
>         (indeed they are arranged in strict genbank format) and paste as
>         genbank
>         file , and afterwards I use Bio::DB::GenBank LOCALLY. The first
>         step is
>         done actually by my hand, by graphic interface which is not
>         convenient.
>
>         Thanks
>         _________________________________________________
>         Bioperl-l mailing list
>         Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>         http://lists.open-bio.org/__mailman/listinfo/bioperl-l
>         <http://lists.open-bio.org/mailman/listinfo/bioperl-l>
>
>
>