[Bioperl-l] Genbank Bioperl problem

Brian Osborne brian_osborne@cognia.com
Thu, 13 Jun 2002 14:43:34 -0400


Ali,

This works for me:

$db = new Bio::DB::RefSeq;
$seq = $db->get_Seq_by_id("NP_457465");
print $seq->seq;

This one does not:

$db = new Bio::DB::RefSeq;
$seq = $db->get_Seq_by_id("NP_457465.1");
print $seq->seq;

So, I admit to some puzzlement since NP_457465.1 does exist in RefSeq.

Keep in mind that this discussion may be academic since Jason just migrated
all these modules to use eutils, so any difficulties we see with these
modules will vanish, we hope, in the next release. Or simply mutate to more
malevolent forms...

Brian O.


-----Original Message-----
From: bioperl-l-admin@bioperl.org [mailto:bioperl-l-admin@bioperl.org]On
Behalf Of Ali Al-Shahib
Sent: Thursday, June 13, 2002 11:00 AM
To: Brian Osborne
Cc: Stefan A Kirov; Bioperl
Subject: [Bioperl-l] Genbank Bioperl problem


Hi Brian

Genpept seemed to work with me, but I had to use 'id' instead of 'acc' so:
my $seq = $gb->get_Seq_by_id('NP_457465.1');
I also tried RefSeq, but it doesn't work.

However, now I've faced another problem.  I wanted to use the batch (my
$seq = $gb->get_Seq_by_batch($filename)) but Genpept doesn't support this.
Have you any ideas how I can solve this problem, because I have alot of
NP's I need fetching from NCBI, and its impossible for me to do them
without a batch.

Thank you in advance.

Ali

On Thu, 13 Jun 2002, Brian Osborne wrote:

> Ali and Stefan,
>
> Accession numbers starting with NP_ are Genbank RefSeq entries (see
> http://www.ncbi.nlm.nih.gov/LocusLink/RSfaq.html). From the Bioperl FAQ:
>
>   Q2.3: How can I get NT_ or NM_ accessions from NCBI (Reference
>       Sequences)?
>
>       Use Bio::DB::RefSeq not Bio::DB::GenBank when you are retrieving
>       the NM_ accessions. This is still an area of active development
>       because the data providers have not provided the best interface for
>       us to query.  EBI has provided a mirror with their dbfetch system
>       which is accessible through the Bio::DB::RefSeq object however,
>       there are cases where NT_ accessions will not be retrievable.
>
> Bio::DB::GenPept won't work, and a one-liner using Bio::DB::RefSeq seemed
to
> work. I'll change the FAQ so that it refers to NP_'s as well.
>
> Brian O.
>
> -----Original Message-----
> From: bioperl-l-admin@bioperl.org [mailto:bioperl-l-admin@bioperl.org]On
> Behalf Of Stefan A Kirov
> Sent: Wednesday, June 12, 2002 2:59 PM
> To: Ali Al-Shahib
> Cc: Bioperl
> Subject: Re: [Bioperl-l] Genbank Bioperl help
>
> Use Bio::DB::GenPept for proteins!
> Good luck!
> Stefan
>
> On Wed, 12 Jun 2002, Ali Al-Shahib wrote:
>
> >
> >Hi
> >
> >I've got a set of accession numbers but they start with 'NP_' as they are
> >proteins.  I've used the genbank module (Bio::DB::GenBank) and produced
> >the following script:
> >
> >#!/usr/local/bin/perl -w
> >
> >use Bio::DB::GenBank;
> >use Bio::Species;
> >my $gb = new Bio::DB::GenBank;
> >
> >#get a particular accession number
> >my $seq = $gb->get_Seq_by_acc('NP_347647');
> >
> >#get the sepecies from the 'sequence' object
> >my $sp = $seq->species();
> >
> >#get the classification
> >my @class = $sp->classification();
> >
> >#print out the result, line by line
> >print join ("\n", @class), "\n";
> >
> >However it works for accssion numbers for nucleotide sequences but not of
> >protien sequences.  How can I change the script to make it fetch the
> >organsim name from genbank using the protein accession number which
starts
> >with 'NP_' (example: NP_347647.1).  It fetches accession numbers like
> >AC021953, but not 'NP_.....'.
> >
> >I would greatly appreciate it if you can answer my query.
> >
> >Thank you in advance
> >
> >Ali
> >--
> >Mr Ali Al-Shahib
> >Research Student
> >Bioinformatics Research Centre
> >Department of Computing Science
> >17 Lilybank Gardens
> >University of Glasgow
> >Glasgow G12 8QQ
> >Scotland, UK
> >Tel: 0141 330 2421 (direct)
> >E-mail: alshahib@dcs.gla.ac.uk
> >Web page: http://www.dcs.gla.ac.uk/~alshahib
> >
> >
> >
> >
> >
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l@bioperl.org
> >http://bioperl.org/mailman/listinfo/bioperl-l
> >
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
>
>

--
Mr Ali Al-Shahib
Research Student
Bioinformatics Research Centre
Department of Computing Science
17 Lilybank Gardens
University of Glasgow
Glasgow G12 8QQ
Scotland, UK
Tel: 0141 330 2421 (direct)
E-mail: alshahib@dcs.gla.ac.uk
Web page: http://www.dcs.gla.ac.uk/~alshahib





_______________________________________________
Bioperl-l mailing list
Bioperl-l@bioperl.org
http://bioperl.org/mailman/listinfo/bioperl-l