[Bioperl-l] Problem retrieving CDS by Acession #

Sean Davis sdavis2 at mail.nih.gov
Thu Sep 7 15:48:39 UTC 2006


On Thursday 07 September 2006 10:32, Ryan Golhar wrote:
> > On Thursday 07 September 2006 01:09, Ryan Golhar wrote:
> > > Hi,
> > >
> > > I'm using Bio::DB::GenBank::get_Seq_by_acc() passing in a valid
> > > accession #, XM_547879.2, for instance.
> > >
> > > I get the message in return:
> > >
> > > -------------------- WARNING ---------------------
> > > MSG: acc (gb|XM_547879.2) does not exist
> > > ---------------------------------------------------
> > >
> > > If I go to NCBI, and enter the accession, the GenBank entry
> >
> > comes up.
> >
> > > At first I suspected it was the version number, but removing the
> > > version number still causes the same error.
> > >
> > > Am I doing something wrong?
> >
> > from the Docs for Bio::DB::Genbank:
> >
> >     $seq = $gb->get_Seq_by_acc('J00522'); # Accession Number
> >     $seq = $gb->get_Seq_by_version('J00522.1'); # Accession.version
> >     $seq = $gb->get_Seq_by_gi('405830'); # GI Number
> >
> > So, you might try using get_Seq_by_version(....).  I didn't
> > test it, but give
> > that a shot.
>
> get_Seq_by_version() worked.
>
> That does not explain why get_Seq_by_acc does not work with the primary
> part of the accession #.

As an example of why this shouldn't work, doing a search in entrez (online 
version) will bring up the newest version of an accession if the version is 
not included.  If one specifies the version, though, one gets that version, 
even if it is not the newest.  So, asking get_Seq_by_acc() with a version and 
ignoring the version would potentially get you the wrong version for the 
accession.  

If you know that you want the most recent version, just strip the version 
information and use get_Seq_by_acc().

Sean



More information about the Bioperl-l mailing list