[Bioperl-l] Genome Information

shalabh sharma shalabh.sharma7 at gmail.com
Tue Oct 26 16:23:02 UTC 2010


Hey Chris,
            This information is really useful.
Actually i was using Bio::DB::Taxonomy for taxonomy information
and Bio::DB::EUtilities to get the genome size (i didn't know that i can
just use Bio::DB::EUtilities for all the information).
I was very confused about getting GC% and coding% info but i
think WWW::Mechanize might help me out.
I really appreciate your help.

-Shalabh


On Tue, Oct 26, 2010 at 12:11 PM, Chris Fields <cjfields at illinois.edu>wrote:

> Shalabh,
>
> I don't know if there is a quick one-step way of getting this information
> via NCBI w/o wrangling with query term limit magic, and even then you will
> be bound to whatever version of the genome is present within the database of
> interest.
>
> For instance, via eutils you can get summary information for various taxa,
> genomes, and genome projects using the following example code (prints the
> first 10 archaeal genome project summaries; set the '-db' parameter to one
> of 'genomeprj', 'taxonomy', 'genome'):
>
> =================================
>
> use Bio::DB::EUtilities;
>
> my $term = "Archaea[ORGN]";
>
> my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch',
>                                     -db    => 'genome',
>                                     -email => 'cjfields at bioperl.org',
>                                     -usehistory => 'y',
>                                     -term  => $term);
>
> my $hist = $eutil->next_History || die "No history returned";
>
> $eutil->set_parameters(-eutil   => 'esummary',
>                       -history => $hist,
>                       -retmax  => 10);
>
> $eutil->print_all; # print summary info to STDOUT
>
> =================================
>
> GC and coding % don't appear to be stored in any of the above databases,
> but they are displayed via the genome overview.  You could probably use
> something like WWW::Mechanize to grab the summary table information
> displayed using the Genome UID:
>
>
> http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&Cmd=ShowDetailView&TermToSearch=25024
>
> Just don't spam the server with a billion requests (use a timeout!) or
> you'll find yourself blocked.  I may pop an email to NCBI to see if this
> information is programmatically accessible.
>
> chris
>
> On Oct 26, 2010, at 9:09 AM, shalabh sharma wrote:
>
> > Hi All,
> >       I have thousands of taxaIds and i need to find out the following
> > information regarding genomes:
> > 1) Taxonomy information
> > 2) GC%
> > 3) total coding genes %
> >
> > I can easily find the taxonomy info by using Bio::DB::Taxonomy but for
> the
> > other two i am stuck.
> > Is there any way i can find this info?
> > I would really appreciate your help.
> >
> > Thanks
> > Shalabh
> > -------------------------------
> > Shalabh Sharma
> > Scientific Computing Professional Associate (Bioinformatics Specialist)
> > Department of Marine Sciences
> > University of Georgia
> > Athens, GA 30602-3636
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>



More information about the Bioperl-l mailing list