[Bioperl-l] get_Stream_by_query() appears to return empty stream

Brian Osborne bosborne11 at verizon.net
Tue Nov 20 23:50:00 UTC 2012


Felix,

I took a look at the Bio::DB::Query::GenBank documentation, it says this:

If you provide an array reference of IDs in -ids, the query will be ignored and the list of IDs will be used when the query is passed to a Bio::DB::GenBank object's get_Stream_by_query() method. 

Bio::DB::Genbank queries "nucleotide", by default. You have GeneIDs. I see that you're setting "-id" to "gene" but note that you're passing that query to a plain Bio::DB::GenBank object. Not sure what the expected behavior is here.

I would try using the NCBI Eutilities for that second query, rather than Bio::DB::Query::GenBank (http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook).

Brian O.

On Nov 1, 2012, at 8:01 PM, Felix Horns <rfhorns at gmail.com> wrote:

> Hello everyone.
> 
> I am having trouble using the get_Stream_by_query() function
> in Bio::DB::GenBank.  It seems to return an empty stream, such that
> $stream->next_seq never returns anything.
> 
> However, $query->count is returning the expected value (139).  Also,
> get_Stream_by_query() seems to be querying the database, as when I pass it
> an array of GeneIDs that have not been properly formatted, i.e.
> GeneID:7816864, instead of simply 7816864, it returns warnings and errors:
> "MSG: Warning(s) from GenBank: <PhraseNotFound>GeneID 7817709...; MSG:
> Error from Genbank: No items found.".
> 
> I have included my full code below. I have also included the output from
> the code below that.  The code is intended to find genes located within a
> genomic region. I will later find the protein domains and pathways that
> those genes are involved in.
> 
> Any help would be greatly appreciated.  I realize that this is probably a
> very simple question, but I am relatively new to BioPerl and I've spent the
> better part of the day trying to figure out such issues, so I would be very
> thankful for help.
> 
> Felix
> 
> 
> #!/usr/bin/perl
> use strict;
> use Bio::SeqIO;
> use Bio::DB::EntrezGene;
> use Bio::DB::GenBank;
> 
> # Load reference sequence
> # Load from local .gb file
> # Note that .gb file does not include sequences
> # my $gbfile = "NC_012660.1.gb";
> # my $seqio = Bio::SeqIO->new(-file => $gbfile);
> # my $ref_seq = $seqio->next_seq;
> 
> # To access reference sequence programatically, uncomment this code
> my $gb = new Bio::DB::GenBank;
> my $ref_seq = $gb->get_Seq_by_acc("NC_012660.1");
> 
> # Specify coordinates of gap
> my $gap_start = 2050506;
> my $gap_end = 2190530;
> 
> my $gene_count = 0;
> my @features;
> my @starts;
> my @ends;
> my @db_xrefs;
> 
> my @products;
> my @protein_ids;
> 
> # Get gene features in gap
> for my $feat ($ref_seq->get_SeqFeatures) {
>  my $start=$feat->location->start;
>  my $end=$feat->location->end;
> 
>  if (($feat->primary_tag eq 'gene') &
>      ($gap_start < $start) & ($start < $gap_end) &
>      ($gap_start < $end) & ($end < $gap_end)) {
> 
>    $gene_count += 1;
> 
>    # Get GeneID reference
>    my $db_xref = ($feat->get_tag_values('db_xref'))[0];
>    $db_xref =~ s/GeneID://;    # Trim "GeneID:" from start of $db_xref
> 
>    push @features, $feat;
>    push @starts, $start;
>    push @ends, $end;
>    push @db_xrefs, $db_xref;
>  }
> }
> 
> # Get data about gene features from GeneID reference
> my $query = Bio::DB::Query::GenBank->new(-db => 'gene',
> -ids => [@db_xrefs]);
> my $stream = $gb->get_Stream_by_query($query);
> 
> while (my $seq = $stream->next_seq) {
>  for my $feat ($seq->all_SeqFeatures) {
>    print "primary tag: ", $feat->primary_tag, "\n";
>    for my $tag ($feat->get_all_tags) {
>      print "  tag: ", $tag, "\n";
>      for my $value ($feat->get_tag_values($tag)) {
> print "    value: ", $value, "\n";
>      }
>    }
>  }
> }
> 
> print $query->count,"\n";
> print $gene_count, "\n";
> 
> 
> OUTPUT
>> perl analyze_gap.pl
> 139
> 139
> 
> Note that no "primary tag; tag; value" items are printed.  Furthermore,
> when I put a print line immediately after the (while (my $seq =
> $stream->next_seq)) statement, it was never called, seemingly indicating
> that the stream is empty.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list