[Bioperl-l] Are there arguments for REGION of ACCESSION in Bio::DB

Tue Mar 13 04:30:43 EDT 2012

Dear Roy,
Great thanks for your reply. And I try it as soon as I receive your mail.
However, it reports an error:

MSG: acc NM_000344 does not exist
 STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/local/share/perl/5.10.1/Bio/Root/Root.pm:368
STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
/usr/local/share/perl/5.10.1/Bio/DB/WebDBSeqI.pm:195
STACK: test_gene_bank_with_sublocation.pl:26

I've repeatedly checked my codes, and still cannot figure out where is the
bug. At first I think maybe it does not support genome assembly
(NC_000005), thus I try SMN1 gene directly ( NM_000344). Neither of them
works. Even the simplest codes still report the error: "acc NM_000344 does
not exist", while the accession number does exists,
http://www.ncbi.nlm.nih.gov/nuccore/NM_000344.3.
 My test code is (almost exactly copied from HOWTO tutorial) :

use strict;
use warnings;
use Bio::DB::GenBank;
my $gb = Bio::DB::GenBank->new (-format => 'genbank', -seq_start => 1,
-seq_stop => 2000, -strand =>1,);
my $seq_obj = $gb->get_Seq_by_acc('NM_000344');
print $seq_obj; #just for test

Currently my perl is 5.10.1, and BioPerl stays in 1.6.1. All codes run on
Ubuntu 10.04 LTS. I've checked Bio::DB::GenBank module of 1.6.1 version,
and it supports -seq_start and -seq_stop function.
Any ideas? Hope I don't make some low-level mistakes. Look forward to your
reply.
Thanks.

On Mon, Mar 12, 2012 at 8:38 PM, Roy Chaudhuri <roy.chaudhuri at gmail.com>wrote:

> I think this is what you want:
> http://www.bioperl.org/wiki/**HOWTO:Getting_Genomic_**
> Sequences#Using_Bio::DB::**GenBank_when_you_have_genomic_**
> coordinates_to_get_a_Seq_**object<http://www.bioperl.org/wiki/HOWTO:Getting_Genomic_Sequences#Using_Bio::DB::GenBank_when_you_have_genomic_coordinates_to_get_a_Seq_object>
>
>
> On 12/03/2012 05:33, yun YAN wrote:
>
>> One's goal is to get both exon/intron region of gene of interest from
>> remote database(NCBI), with the help of Bio::DB::GenBank. "get_seq_by_acc"
>> will work for most cases, but it seems that it cannot be used for
>> exon/intron parsing.
>>
>> Let's say gene SMN1,
>> http://www.ncbi.nlm.nih.gov/**nuccore/NC_000005.9?report=**
>> genbank&from=70220768&to=**70248839<http://www.ncbi.nlm.nih.gov/nuccore/NC_000005.9?report=genbank&from=70220768&to=70248839>
>>  .
>> The exon/inron information can only be available in genome assembly part,
>> and the accession number (
>> NC_000005<http://www.ncbi.nlm.**nih.gov/nuccore/NC_000005<http://www.ncbi.nlm.nih.gov/nuccore/NC_000005>>)
>> is
>>
>> actually the genome contig, not gene. To define my gene SMN1, an
>> additional
>> argument "REGION" is needed (REGION: 70220768..70248839). If I use simply
>> "get_seq_by_acc", it will not return the gene, but return the genome
>> assembly results.
>>
>> Thus any ideas about how to retrieve the gene (not mRNA) containing both
>> exon/intron? Are there any additional arguments in get_by_acc('XXXX')
>> REGION( 1234..6789), perhaps?
>>
>> I want to use command-line as much as possible. I used to copy out the
>> page
>> (indeed they are arranged in strict genbank format) and paste as genbank
>> file , and afterwards I use Bio::DB::GenBank LOCALLY. The first step is
>> done actually by my hand, by graphic interface which is not convenient.
>>
>> Thanks
>> ______________________________**_________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/**mailman/listinfo/bioperl-l<http://lists.open-bio.org/mailman/listinfo/bioperl-l>
>>
>
>