[Bioperl-l] extracting ORGANISM line from genbank file

Rohit Ghai ghai.rohit at gmail.com
Mon Aug 24 08:53:03 EDT 2009


hi

I think you forgot to add the "seq" in the builder.. thats why the file is
empty.
Also, the species name, though being parsed, is nowhere in the output.
Here's a version
using fasta output that you can probably customize further. This also takes
the full
name of the organism and adds to the description line in the output.

use strict;
use Bio::SeqIO;
use Bio::Seq::SeqBuilder;

my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n";
       my $infile = shift or die $usage;
       my $infileformat = 'Genbank' ;
       my $outfile = shift or die $usage;
       my $outfileformat = 'fasta';
       my $i = 0;

       my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
                                    '-format' => $infileformat);

            my $seq_out = Bio::SeqIO->new('-file' => ">$outfile",
                                     '-format' => $outfileformat);

            my $builder = $seq_in->sequence_builder();

            $builder->want_none();

$builder->add_wanted_slot('display_id','species','seq','description');

 while(my $seq = $seq_in->next_seq()) {

     my $desc = $seq->description();
     my $species_string = $seq->species()->binomial('FULL');
    $desc = $desc . " [$species_string]";
    $seq->description($desc);
     $seq_out->write_seq($seq);
 }

   exit;


On Mon, Aug 24, 2009 at 11:20 AM, Anna Kostikova <geoeco at rambler.ru> wrote:

>
> Dear all,
>
> I am trying to extract species taxonomy from ORGANISM line. In fact I only
> need a first line under ORGANISM tag (e.i. genus + species). I though that
> it would be possible to do with the SeqBuilder object by stating
>
> $builder->add_wanted_slot('display_id','species');
>
> the problem is, however, that I've got an empty file as a result.
> What might be wrong with the script (see below)?
> Thanks a lot in advance for any ideas,
>
> -------------------------------------------
>
> #!/usr/bin/perl
> use strict;
> use Bio::SeqIO;
> use Bio::Seq::SeqBuilder;
>
> my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n";
>        my $infile = shift or die $usage;
>        my $infileformat = 'Genbank' ;
>        my $outfile = shift or die $usage;
>        my $outfileformat = 'raw';
>                 my $i = 0;
>
>        my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
>                                     '-format' => $infileformat);
>
>             my $seq_out = Bio::SeqIO->new('-file' => ">$outfile",
>                                      '-format' => $outfileformat);
>
>                my $builder = $seq_in->sequence_builder();
>
>  $builder->want_none();
>  $builder->add_wanted_slot('display_id','species');
>
>  while(my $seq = $seq_in->next_seq()) {
>      $seq_out->write_seq($seq);
>  }
>
>    exit;
>
> ----------------------------------------------------
>
> Anna
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



More information about the Bioperl-l mailing list