[Bioperl-l] get cds from genbank record

Heikki Lehvaslaiho heikki at nildram.co.uk
Fri Jan 16 15:12:01 EST 2004


Alberto,

I do not get it. The CGI module should not be called from this script, not 
from any standard bioperl module. Could you please check local your setupand 
make sure that perl can find all the modules. Even better: use a debugger to 
follow the exacution of the code to see ehere is goes wrong.

If you run the following script like this:
   perl /path/to/script /path/to/bioperlcore/t/data/NM_002254.gb

You should get the output:

>gi|GI:19923140|gb|KIF3C|NP_002245.2
MASKTKASEALKVVARCRPLSRKEEAAGHEQILTMDVKLGQVTLRNPRAAPGELPKTFTF
DAVYDASSKQADLYDETVRPLIDSVLQGFNGTVFAYGQTGTGKTYTMQGTWVEPELRGVI
PNAFEHIFTHISRSQNQQYLVRASYLEIYQEEIRDLLSKEPGKRLELKENPETGVYIKDL
SSFVTKNVKEIEHVMNLGNQTRAVGSTHMNEVSSRSHAIFIITVECSERGSDGQDHIRVG
KLNLVDLAGSERQNKAGPNTAGGAATPSSGGGGGGGGSGGGAGGERPKEASKINLSLSAL
GNVIAALAGNRSTHIPYRDSKLTRLLQDSLGGNAKTIMVATLGPASHSYDESLSTLRFAN
RAKNIKNKPRVNEDPKDTLLREFQEEIARLKAQLEKRGMLGKRPRRKSSRRKKAVSAPPG
YPEGPVIEAWVAEEEDDNNNNHRPPQPILESALEKNMENYLQEQKERLEEEKAAIQDDRS
LVSEEKQKLLEEKEKMLEDLRREQQATELLAAKYKAMESKLLIGGRNIMDHTNEQQKMLE
LKRQEIAEQKRREREMQQEMMLRDEETMELRGTYTSLQQEVEVKTKKLKKLYAKLQAVKA
EIQDQHDEYIRVRQDLEEAQNEQTRELKLKYLIIENFIPPEEKNKIMNRLFLDCEEEQWK
FQPLVPAGVSSSQMKKRPTSAVGYKRPISQYARVAMAMGSHPRYRAENIMFLELDVSPPA
VFEMEFSHDQEQDPRALHMERLMRLDSFLERPSTSKVRKSRSWCQSPQRPPPSTTHASLA
SASLRPATVADHE


-------------------------------------------------------------------
#!/usr/sbin/perl -w
#use strict;
use Bio::SeqIO;
use Bio::Seq;

my $in = new Bio::SeqIO(-file => "$ARGV[0]", -format => 'genbank');
#my $out = new Bio::SeqIO(-file => ">$ARGV[0].tfa", -format => 'fasta');
my $out = new Bio::SeqIO(-format => 'fasta');
while ( my $seq = $in->next_seq ) {

    foreach my $f ( grep { $_->primary_tag eq 'CDS' }
                    $seq->top_SeqFeatures ) {
        my ($gname);
        if ( $f->has_tag('gene') ) {
            ($gname) = $f->each_tag_value('gene');
        } elsif ( $f->has_tag('product') ) {
            ($gname) = $f->each_tag_value('product');
        }
        my ($ref) = $f->has_tag('protein_id') && 
$f->each_tag_value('protein_id');
        my ($gi)  = $f->has_tag('db_xref') && $f->each_tag_value('db_xref');

        my ($translation) = $f->has_tag('translation') &&
            $f->each_tag_value('translation');

        unless( $gi && $ref && $gname && $translation ) {
            print STDERR "not fully annotated CDS ($gi,$ref,$gname), 
skipping...\n";
            next;
        }
        my $outseq = Bio::PrimarySeq->new
            (-seq => $translation,
             -display_id =>
             sprintf("gi|%s|gb|%s|%s",$gi,$gname,$ref));
        $out->write_seq($outseq);
    }
}

-------------------------------------------------------------------

The other problem you had:
I 've pointed to our server maintainer, Chris,  your problem, but I suspect 
those warnings about potential spam in your mails do not come from bioperl 
servers or the way they are set up.

	-Heikki


On Friday 16 Jan 2004 15:39, Pedro Antonio Reche wrote:
> Dear Heikki,
> thanks for your replay. Actually I have tried to use that before and
> still does not work. The program still quits with the same error
> message.
> Use of uninitialized value in -d at /usr/share/perl/5.6.1/CGI.pm line
> 3327.
> [Fri Jan 16 10:29:50 2004] bio_get_cds.pl: Filehandle
> Bio::Tools::GuessSeqFormat::$fh opened only for output at
> /usr/local/share/perl/5.6.1/Bio/Tools/GuessSeqFormat.pm line 472.
> Content-type: text/html
>
> Content-Type: text/html; charset=ISO-8859-1
>
> <?xml version="1.0" encoding="utf-8"?>
> <!DOCTYPE html
>          PUBLIC "-//W3C//DTD XHTML Basic 1.0//EN"
>          "http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd">
> <html xmlns="http://www.w3.org/1999/xhtml"
> lang="en-US"><head><title>Error</title>
> </head><body><h1>Error</h1><p>Sorry, the following error has occurred:
> </p><p><i>Can't locate object method &quot;throw&quot; via package
> &quot;Bio::Root::Exception&quot; (perhaps you forgot to load
> &quot;Bio::Root::Exception&quot;?) at
> /usr/local/share/perl/5.6.1/Bio/Root/Root.pm line 328.
>
> by the way every time I execute one of the bioperl working scripts I
> get the annoying message of the first line:
> Use of uninitialized value in -d at /usr/share/perl/5.6.1/CGI.pm line
> 3327.
> Do you know why is that.
>
> One more thing, following my first post I got an email from from
> quarantine at maxstrengthmail.com on behalf of the bioperl mailing list
> saying that my e-mail was in quarantine, and asking me press a link to
> get the message deliver. Oddly, I could see the message already posted
> and therefore I wonder if this was a legitimate e-mail or some type so
> scang.  Please advice. Here it follows the message I got.
> Dear Pedro Antonio Reche <reche at research.dfci.harvard.edu>:
>
> I am an automated email sentry designed to protect against unwanted
> email.
>
> You recently sent an email titled '[Bioperl-l] get cds from genbank
> record' to 'bioperl-l'.
>
> Your email message has triggered anti-SPAM concerns and it has been put
> into a holding area.  We need you to confirm that it is not SPAM so it
> can
> be forwarded to its recipient.  It is a very simple process, please use
> a
> browser to go to this location:
>
> http://www.maxstrengthmail.com/quarantine/release?emailid=112.20040115
> -153106.001.350964&ask=yes
>
> Just click the checkbox and hit submit, and your mail will be delivered.
>
> The SPAM conditions that were raised were:
>       * SuspiciousMessageContent_001
>       * SuspiciousMessageContent_011
>
> Adjusting SPAM filters is a continual process and we apologize that
> your legitimate email has been caught.  By forwarding your email
> via this form, your confirmation will be logged and the email filters
> will be adjusted accordingly.
>
> Thank you for your patience with this process.
>
> MaxStrengthMail 1.0_20021213
>
> On Jan 16, 2004, at 2:50 AM, Heikki Lehvaslaiho wrote:
> > Perdo,
> >
> > It should work again if you spesify the sequence format for the output:
> >
> >   my $out = new Bio::SeqIO(-file => ">$ARGV[0].tfa", -format =>
> > 'fasta');
> >
> > It should have been spesified even before 1.4, I an surprised that it
> > worked
> > without it.
> >
> > 	-Heikki
> >
> > On Thursday 15 Jan 2004 20:35, Pedro Antonio Reche wrote:
> >> !/usr/sbin/perl -w
> >> #use strict;
> >> use Bio::SeqIO;
> >> use Bio::Seq;
> >>
> >>
> >> my $in = new Bio::SeqIO(-file => "$ARGV[0]", -format => 'genbank');
> >> my $out = new Bio::SeqIO(-file => ">$ARGV[0].tfa");
> >>
> >> while( my $seq = $in->next_seq ) {
> >>
> >>     foreach my $f ( grep { $_->primary_tag eq 'CDS' }
> >>        $seq->top_SeqFeatures ) {
> >>     my ($gname);
> >>     if( $f->has_tag('gene') ) {
> >>         ($gname) = $f->each_tag_value('gene');
> >>     } elsif( $f->has_tag('product') ) {
> >>         ($gname) = $f->each_tag_value('product');
> >>     }
> >>
> >>     my ($ref) = $f->has_tag('protein_id') &&
> >> $f->each_tag_value('protein_id');
> >>     my ($gi)  = $f->has_tag('db_xref') &&
> >> $f->each_tag_value('db_xref');
> >>
> >>     my ($translation) = $f->has_tag('translation') &&
> >>                           $f->each_tag_value('translation');
> >>
> >>     unless( $gi && $ref && $gname && $translation ) {
> >>       print STDERR "not fully annotated CDS ($gi,$ref,$gname),
> >> skipping...\n";
> >>       next;
> >>     }
> >>     my $outseq = Bio::PrimarySeq->new(-seq => $translation,
> >>                                       -display_id =>
> >>             sprintf("gi|%s|gb|%s|%s",$gi,$gname,$ref));
> >>     $out->write_seq($outseq);
> >>   }
> >
> > --
> > ______ _/      _/_____________________________________________________
> >       _/      _/                      http://www.ebi.ac.uk/mutations/
> >      _/  _/  _/  Heikki Lehvaslaiho    heikki_at_ebi ac uk
> >     _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
> >    _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
> >   _/  _/  _/  Cambs. CB10 1SD, United Kingdom
> >      _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
> > ___ _/_/_/_/_/________________________________________________________
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
> ========================================================================
> ======
> Pedro A Reche, PhD
> MIF Bioinformatics Group Head
> Dana-Farber Cancer Institute  (D1510A)
>                                              TL: 617 632 3824
> Harvard Medical School
>                                                           FX: 617 632
> 3351
> 44 Binney Street ,
>                              EM: reche at research.dfci.harvard.edu
> Boston, MA 02115, USA
>                                         W3: www.mifoundation.org

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki_at_ebi ac uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________



More information about the Bioperl-l mailing list