[Bioperl-l] Bio::DB::EntrezGene or Bio::DB::Query::GenBank to obtain sequence metadata without sequence

Chris Fields cjfields at illinois.edu
Sun Oct 11 18:53:18 EDT 2009


On Oct 11, 2009, at 5:03 PM, Dan Kortschak wrote:

> Hi Russell,
>
> I ended up using a hodgepodge of Bio::DB::GenBank and
> Bio::DB::EUtililties
>
> #!/usr/bin/perl
>
> use Bio::DB::EUtilities;
> use Bio::DB::GenBank;
>
> my @uids = qw(89161185 89161199 89161205 89161207 51511721 89161210  
> 89161213 51511724 89161216 89161187 51511727 89161190 51511729  
> 51511730 51511731 51511732 51511734 51511735 42406306 51511747  
> 51511750 89161203 17981852 89161218 89161220);
>
> my $gb = Bio::DB::GenBank->new(-complexity => 1,-seq_stop=>1);
> my $seqio = $gb->get_Stream_by_gi(\@uids);
> my $summary = Bio::DB::EUtilities->new(-eutil => 'esummary',
>                           -db => 'nucleotide',
>                           -id => \@uids);
>
> print "chromosome,refSeq,name,length\n";
>
> my $index=0;
>
> while (my $seq = $seqio->next_seq() and my $ds = $summary- 
> >next_DocSum) {
> 	warn "Database queries don't reconcile",$seq->primary_id,"-",$ds- 
> >get_id,"\n" if $seq->primary_id != $ds->get_id;
> 	print $index++,",",$seq->id(),".",$seq->version,",";
> 	($feat)=$seq->get_SeqFeatures;
> 	if( defined $seq->species && $feat->annotation->get_Annotations 
> ('chromosome')) {
> 		print $seq->species->binomial;
> 		print " chromosome ",$feat->annotation->get_Annotations 
> ('chromosome'),",";
> 	} elsif (defined $seq->species && $feat->annotation->get_Annotations 
> ('organelle')) {
> 		print $seq->species->binomial;
> 		print " ",$feat->annotation->get_Annotations('organelle'),",";
> 	} else {
> 		$_=$seq->desc;
> 	        /^([[:alnum:] ]+)[[:graph:]]/;
> 		print "$1,";
> 	}
> 	print $ds->get_contents_by_name('Length') if $ds- 
> >get_contents_by_name('Length');
> 	print "\n";
> }
>
>
> If Bio::DB:EUtilities gives rich seq or equivalent I might change over
> from Bio::DB::GenBank, but it pretty much works at the moment (the
> fail-overs give me grief and I don't like the kludge of asking for a
> single base, but they work to get some of the details that the DocSum
> doesn't - sensible title for example).
>
> cheers
> Dan

Bio::DB::EUtilities is a generic front-end to NCBI's eutils, so you  
can retrieve the gene info in whatever format via efetch and pass it  
on to the appropriate parser (Bio::SeqIO::entrezgene, which requires  
ASN1 at this time).  It's probably best to save the ASN1 data to a  
temp file prior to that (getting the raw content as a file handle  
isn't set up yet).

chris



More information about the Bioperl-l mailing list