[Bioperl-l] Bio::DB::GenPept - get_Seq_by_id

Jason Stajich jason.stajich at duke.edu
Sat Apr 16 12:12:38 EDT 2005


If you specify -verbose => 1 when initializing an object you will often 
see debugging statements.

$sp = Bio::DB::GenPept->new(-verbose => 1);


You will see that the URLs generated to fetch the sequence are proper 
(we're not truncating the id or anything).  After playing around it 
looks  like we have to put the ID in quotes if it starts with a number 
otherwise the server assumes it is a gi number.  I think this might be 
an NCBI shortcut?

In the short term you should just do this to your id strings to quote 
them if they start with a number.
$id = "\"$id\"" if $id =~ /^\d/;

We'll add code in the modules to detect and fix these automatically -- 
quoting GI numbers doesn't seem to cause problems so maybe we should 
quote every id?


If you are only querying swissprot data you might find 
Bio::DB::SwissProt useful as well.We'll add code in the modules to 
detect and fix these automatically -- quoting GI numbers doesn't seem 
to cause problems so maybe we should quote every id?

Bio::DB::NCBIHelper was updated in CVS to quote the ids before making 
the URL for the query.

I put some fixed into CVS which better parse swissprot fields from 
DBSOURCE (in Bio/SeIO/genbank) as well although it is always better to 
get this from the original swissprot records as there is some munging 
in the transfer process.

-jason

On Apr 15, 2005, at 8:24 PM, Jamie Sherman wrote:

> I'm getting really odd behavior when I user get_Seq_by_id to retrieve 
> from GenPept. I'm trying to retrieve by name where name is like 
> 'ROA1_HUMAN". When I have a name that starts with a Letter it works 
> great but for names that start with a number it returns junk. Is there 
> a work around for this or am I doing something wrong? Can I create a 
> Bio::DB::GenPept->new( arg to specify search type )?
>      Thanks,
> 	--Jamie
>
>
> Program:
> #!/usr/bin/perl -w
>
> use Bio::DB::GenPept;
> $sp = Bio::DB::GenPept->new;
>
> # worked $query = 'AAP1_YEAST';
> # worked $query = "ROA1_HUMAN";
> $query = "2AAA_YEAST";  #doesn't work?
>
> $seq = $sp->get_Seq_by_id($query);
> print $seq->desc . "\n";
> print $seq->primary_id . "\n";
>
>
> Output:
> [AAP1_YEAST]
> Alanine/arginine aminopeptidase.
> 728771
>
> [ROA1_HUMAN]
> Heterogeneous nuclear ribonucleoprotein A1 (Helix-destabilizing 
> protein) (Single-strand binding protein) (hnRNP core protein A1).
> 133254
>
> [2AAA_YEAST]
> B.taurus DNA sequence 1 from patent application EP0238993.
> 2
>
> It is using 2 as the ID number, How do I escape this?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l



More information about the Bioperl-l mailing list