[Bioperl-l] get_sequence() gets some sequences but not others

Brian Osborne bosborne11 at verizon.net
Wed Jun 20 18:59:39 UTC 2007


Kurt,

I can't answer your question but I wouldn't use Bio::Perl myself, I'd use
Bio::DB::GenPept:

501 ~>perl -e 'use Bio::DB::GenPept; $db = Bio::DB::GenPept->new; $seq =
$db->get_Seq_by_acc('NEM1_YEAST'); print $seq->seq;'
MNALKYFSNHLITTKKQKKINVEVTKNQDLLGPSKEVSNKYTSHSENDCVSEVDQQYDHSSSHLKESDQNQERKNS
VPKKPKALRSILIEKIASILWALLLFLPYYLIIKPLMSLWFVFTFPLSVIERRVKHTDKRNRGSNASENELPVSSS
NINDSSEKTNPKNCNLNTIPEAVEDDLNASDEIILQRDNVKGSLLRAQSVKSRPRSYSKSELSLSNHSSSNTVFGT
KRMGRFLFPKKLIPKSVLNTQKKKKLVIDLDETLIHSASRSTTHSNSSQGHLVEVKFGLSGIRTLYFIHKRPYCDL
FLTKVSKWYDLIIFTASMKEYADPVIDWLESSFPSSFSKRYYRSDCVLRDGVGYIKDLSIVKDSEENGKGSSSSLD
DVIIIDNSPVSYAMNVDNAIQVEGWISDPTDTDLLNLLPFLEAMRYSTDVRNILALKHGEKAFNIN502 ~>

It's true that Bio::Perl is easy-to-use but it's also _very_ limited.

Brian O.


On 6/20/07 2:11 PM, "Wollenberg, Kurt (NIH/NIAID)"
<wollenbergk at mail.nih.gov> wrote:

> Greetings:
> 
> I am working on a script to take a list of sequence IDs, extract the
> sequences from GenPept, and then run a BLAST search for each of the
> retrieved sequences. I am having a problem with the sequence retrieval,
> where some sequences are found and others are not and it's not obvious to me
> why this is. 
> 
> For example, using a text file containing the two following IDs as input:
> SKG3_YEAST
> NEM1_YEAST
> 
> My script 
> 
> while( <IN> ) {
>   chomp;
>   my $seqid = $_;
>   my $seq_obj = get_sequence( 'genpept', $seqid );
> }
> 
> will create a sequence object for the first ID, (print "Accession of
> ",$seqid," is ",$seq_obj->accession, "\n"; gives me the correct accession
> number) but for the second I am told
> 
> -------------------- WARNING ---------------------
> MSG: id (NEM1_YEAST) does not exist
> ---------------------------------------------------
> 
> When I pull up these records using the Entrez cross-databse search in my web
> browser I find genpept records for both SKG3_YEAST and NEM1_YEAST (using
> these search terms). In both records these IDs reside in the same field
> ("DBSOURCE    swissprot: locus") so I'm mystified why get_sequence finds one
> but not the other. Any advice would be greatly appreciated.
> 
> Cheers,
> Kurt Wollenberg, Ph.D.
> Phylogenetics and Sequence Analysis Consultant
> Biocomputing Research Consulting Section
> Bioinformatics and Scientific IT Program (BSIP)
> NIH/NIAID/OTIS
> Contractor, Lockheed Martin
> http://bioinformatics.niaid.nih.gov
> 
> Disclaimer:
> The information in this e-mail and any of its attachments is confidential
> and may contain sensitive information. It should not be used by anyone who
> is not the original intended recipient. If you have received this e-mail in
> error please inform the sender and delete it from your mailbox or any other
> storage devices. National Institute of Allergy and Infectious Diseases shall
> not accept liability for any statements made that are sender's own and not
> expressly made on behalf of the NIAID by one of its representatives.
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list