[Bioperl-l] need help urgently

Tue Sep 15 08:29:02 UTC 2009

When I use Bio::DB::EntrezGene and EUtilities, the accession and
sequence that it returns to me for a gene is the second accession
mentioned in the "Genome Reference Consortium Human Build 37 Primary
Assembly". For eg, if we take entrez gene id 3630, the code returns
accession NT_009237.18. But I actually want to take the sequence of
the first accession i.e. NC_000011.9.

Please let me know how I could get that. Any help will be great.

-Neeti
Even my blood says, B positive

On Mon, Sep 14, 2009 at 1:52 PM, Neeti Somaiya <neetisomaiya at gmail.com> wrote:
> Thanks a lot. This works for me.
>
> I need one more help, can you point me to where exactly can we find
> the link to this FASTA sequence, that we are retrieving here through
> the code, in its actual entry in Entrez Gene in the NCBI website
> (http://www.ncbi.nlm.nih.gov/sites/entrez)
>
> -Neeti
> Even my blood says, B positive
>
>
>
> On Tue, Sep 8, 2009 at 10:11 AM, Smithies, Russell
> <Russell.Smithies at agresearch.co.nz> wrote:
>> That bit of code gave you the accession, start and end for the sequence so you just needed to download it.
>> Bio::DB::Eutilities can do that for you.
>>
>> Did you take a look at http://www.bioperl.org/wiki/HOWTO:Getting_Genomic_Sequences
>>
>>
>>
>> --Russell
>>
>> ==================
>> #!perl -w
>>
>> use strict;
>> use Bio::DB::EntrezGene;
>> use Bio::DB::EUtilities;
>>
>> no warnings 'deprecated';
>>
>> my $id = shift or die "Id?\n"; # use a Gene id
>>
>> my $db = new Bio::DB::EntrezGene;
>> #$db->verbose(1);
>> my $seq = $db->get_Seq_by_id($id);
>>
>> my $ac = $seq->annotation;
>>
>> for my $ann ($ac->get_Annotations('dblink')) {
>>        if ($ann->database eq "Evidence Viewer") {
>>                # get the sequence identifier, the start, and the stop
>>                my ($acc,$from,$to) = $ann->url =~
>>                  /contig=([^&]+).+from=(\d+)&to=(\d+)/;
>>                print "$acc\t$from\t$to\n";
>>
>>                # retrieve the sequence
>>                my $fetcher = Bio::DB::EUtilities->new(-eutil => 'efetch',
>>                                           -db    => 'nucleotide',
>>                                           -rettype => 'fasta');
>>            $fetcher->set_parameters(-id => $acc,
>>                                                -seq_start => $from,
>>                                                -seq_stop  => $to,
>>                                                -strand    => 1);
>>            my $seq = $fetcher->get_Response->content;
>>            print $seq;
>>
>>        }
>> }
>>
>> ======================
>>
>>> -----Original Message-----
>>> From: Neeti Somaiya [mailto:neetisomaiya at gmail.com]
>>> Sent: Tuesday, 8 September 2009 4:28 p.m.
>>> To: Smithies, Russell
>>> Cc: Emanuele Osimo; bioperl-l
>>> Subject: Re: [Bioperl-l] need help urgently
>>>
>>> I actually want the nucleotide sequence of the gene. I thought the
>>> Bio::DB::EntrezGene would give me a seq_obj for an entrez gene id and
>>> then the seq method on that $seq_obj->seq() will give me the actual
>>> genomic nucleotide sequence of the gene. But this doesnt happen. I am
>>> able to print gene symbol using $seq_obj->display_id and able to do
>>> other things, but I wanted the gene nucleotide sequence.
>>>
>>> -Neeti
>>> Even my blood says, B positive
>>>
>>>
>>>
>>> On Tue, Sep 8, 2009 at 1:56 AM, Smithies,
>>> Russell<Russell.Smithies at agresearch.co.nz> wrote:
>>> > This example code from the wiki _definitely_ works:
>>> >
>>> http://bio.perl.org/wiki/HOWTO:Getting_Genomic_Sequences#Using_Bio::DB::Entrez
>>> Gene_to_get_genomic_coordinates
>>> > =========================================
>>> >
>>> > use strict;
>>> > use Bio::DB::EntrezGene;
>>> >
>>> > my $id = shift or die "Id?\n"; # use a Gene id
>>> >
>>> > my $db = new Bio::DB::EntrezGene;
>>> > $db->verbose(1); ###
>>> >
>>> > my $seq = $db->get_Seq_by_id($id);
>>> >
>>> > my $ac = $seq->annotation;
>>> >
>>> > for my $ann ($ac->get_Annotations('dblink')) {
>>> >        if ($ann->database eq "Evidence Viewer") {
>>> >                # get the sequence identifier, the start, and the stop
>>> >                my ($contig,$from,$to) = $ann->url =~
>>> >                  /contig=([^&]+).+from=(\d+)&to=(\d+)/;
>>> >                print "$contig\t$from\t$to\n";
>>> >        }
>>> > }
>>> >
>>> > ======================================
>>> >
>>> > So if it doesn't work for you, there are a few things you need to check:
>>> > * what version of BioPerl are you using?
>>> > * are you behind a firewall?
>>> > * are you using a proxy?
>>> > * do you need to submit username/password for either of the 2 above
>>> > * turn on 'verbose' messages, it may help you debug
>>> >
>>> >
>>> > If you're still having problems, get back to me and I'll see if I can help.
>>> >
>>> > --Russell
>>> >
>>> >
>>> >> -----Original Message-----
>>> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> >> bounces at lists.open-bio.org] On Behalf Of Neeti Somaiya
>>> >> Sent: Monday, 7 September 2009 10:04 p.m.
>>> >> To: Emanuele Osimo; bioperl-l
>>> >> Subject: Re: [Bioperl-l] need help urgently
>>> >>
>>> >> I tried using EntrezGene instead of GenBank, as is given in the link
>>> >> that you sent :
>>> >>
>>> >>
>>> http://www.bioperl.org/wiki/HOWTO:Beginners#Retrieving_a_sequence_from_a_datab
>>> >> ase
>>> >>
>>> >> http://doc.bioperl.org/releases/bioperl-current/bioperl-
>>> >> live/Bio/DB/EntrezGene.html
>>> >>
>>> >> use Bio::DB::EntrezGene;
>>> >>
>>> >>     my $db = Bio::DB::EntrezGene->new;
>>> >>
>>> >>     my $seq = $db->get_Seq_by_id(2); # Gene id
>>> >>
>>> >>     # or ...
>>> >>
>>> >>     my $seqio = $db->get_Stream_by_id([2, 4693, 3064]); # Gene ids
>>> >>     while ( my $seq = $seqio->next_seq ) {
>>> >>           print "id is ", $seq->display_id, "\n";
>>> >>     }
>>> >>
>>> >> This doesnt seem to work.
>>> >>
>>> >>
>>> >> -Neeti
>>> >> Even my blood says, B positive
>>> >>
>>> >>
>>> >>
>>> >> On Fri, Sep 4, 2009 at 1:09 PM, Emanuele Osimo<e.osimo at gmail.com> wrote:
>>> >> > Hello,
>>> >> > have you tried this?
>>> >> >
>>> >>
>>> http://bio.perl.org/wiki/HOWTO:Getting_Genomic_Sequences#Using_Bio::DB::GenBan
>>> >> k_when_you_have_genomic_coordinates
>>> >> >
>>> >> > Emanuele
>>> >> >
>>> >> > On Fri, Sep 4, 2009 at 08:49, Neeti Somaiya <neetisomaiya at gmail.com>
>>> wrote:
>>> >> >>
>>> >> >> Hi,
>>> >> >>
>>> >> >> I have an input list of gene names (can get gene ids from a local db
>>> >> >> if required).
>>> >> >> I need to fetch sequences of these genes. Can someone please guide me
>>> >> >> as to how this can be done using perl/bioperl?
>>> >> >>
>>> >> >> Any help will be deeply appreciated.
>>> >> >>
>>> >> >> Thanks.
>>> >> >>
>>> >> >> -Neeti
>>> >> >> Even my blood says, B positive
>>> >> >> _______________________________________________
>>> >> >> Bioperl-l mailing list
>>> >> >> Bioperl-l at lists.open-bio.org
>>> >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> >> >
>>> >> >
>>> >> _______________________________________________
>>> >> Bioperl-l mailing list
>>> >> Bioperl-l at lists.open-bio.org
>>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> > =======================================================================
>>> > Attention: The information contained in this message and/or attachments
>>> > from AgResearch Limited is intended only for the persons or entities
>>> > to which it is addressed and may contain confidential and/or privileged
>>> > material. Any review, retransmission, dissemination or other use of, or
>>> > taking of any action in reliance upon, this information by persons or
>>> > entities other than the intended recipients is prohibited by AgResearch
>>> > Limited. If you have received this message in error, please notify the
>>> > sender immediately.
>>> > =======================================================================
>>> >
>>
>