[Bioperl-l] Are there arguments for REGION of ACCESSION in Bio::DB

yun YAN youryanyun at gmail.com
Mon Mar 12 05:33:47 UTC 2012


One's goal is to get both exon/intron region of gene of interest from
remote database(NCBI), with the help of Bio::DB::GenBank. "get_seq_by_acc"
will work for most cases, but it seems that it cannot be used for
exon/intron parsing.

Let's say gene SMN1,
http://www.ncbi.nlm.nih.gov/nuccore/NC_000005.9?report=genbank&from=70220768&to=70248839
 .
The exon/inron information can only be available in genome assembly part,
and the accession number (
NC_000005<http://www.ncbi.nlm.nih.gov/nuccore/NC_000005>) is
actually the genome contig, not gene. To define my gene SMN1, an additional
argument "REGION" is needed (REGION: 70220768..70248839). If I use simply
"get_seq_by_acc", it will not return the gene, but return the genome
assembly results.

Thus any ideas about how to retrieve the gene (not mRNA) containing both
exon/intron? Are there any additional arguments in get_by_acc('XXXX')
REGION( 1234..6789), perhaps?

I want to use command-line as much as possible. I used to copy out the page
(indeed they are arranged in strict genbank format) and paste as genbank
file , and afterwards I use Bio::DB::GenBank LOCALLY. The first step is
done actually by my hand, by graphic interface which is not convenient.

Thanks



More information about the Bioperl-l mailing list