[Bioperl-l] extracting ORGANISM line from genbank file

Anna Kostikova geoeco at rambler.ru
Tue Aug 25 07:01:24 UTC 2009


Hi Rohit,

Thanks a lot for your comments, it actually worked well, but in  fact i 
only want to extract species names as I want to have it in a separate 
file together with a fasta file with sequences.
So, thanks a lot again!

Anna

* Rohit Ghai <ghai.rohit at gmail.com> [Mon, 24 Aug 2009 14:53:03 +0200]:
> hi
>
> I think you forgot to add the "seq" in the builder.. thats why the 
file
> is
> empty.
> Also, the species name, though being parsed, is nowhere in the output.
> Here's a version
> using fasta output that you can probably customize further. This also
> takes
> the full
> name of the organism and adds to the description line in the output.
>
> use strict;
> use Bio::SeqIO;
> use Bio::Seq::SeqBuilder;
>
> my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n";
>        my $infile = shift or die $usage;
>        my $infileformat = 'Genbank' ;
>        my $outfile = shift or die $usage;
>        my $outfileformat = 'fasta';
>        my $i = 0;
>
>        my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
>                                     '-format' => $infileformat);
>
>             my $seq_out = Bio::SeqIO->new('-file' => ">$outfile",
>                                      '-format' => $outfileformat);
>
>             my $builder = $seq_in->sequence_builder();
>
>             $builder->want_none();
>
> $builder->add_wanted_slot('display_id','species','seq','description');
>
>  while(my $seq = $seq_in->next_seq()) {
>
>      my $desc = $seq->description();
>      my $species_string = $seq->species()->binomial('FULL');
>     $desc = $desc . " [$species_string]";
>     $seq->description($desc);
>      $seq_out->write_seq($seq);
>  }
>
>    exit;
>
>
> On Mon, Aug 24, 2009 at 11:20 AM, Anna Kostikova <geoeco at rambler.ru>
> wrote:
>
> >
> > Dear all,
> >
> > I am trying to extract species taxonomy from ORGANISM line. In fact 
I
> only
> > need a first line under ORGANISM tag (e.i. genus + species). I 
though
> that
> > it would be possible to do with the SeqBuilder object by stating
> >
> > $builder->add_wanted_slot('display_id','species');
> >
> > the problem is, however, that I've got an empty file as a result.
> > What might be wrong with the script (see below)?
> > Thanks a lot in advance for any ideas,
> >
> > -------------------------------------------
> >
> > #!/usr/bin/perl
> > use strict;
> > use Bio::SeqIO;
> > use Bio::Seq::SeqBuilder;
> >
> > my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n";
> >        my $infile = shift or die $usage;
> >        my $infileformat = 'Genbank' ;
> >        my $outfile = shift or die $usage;
> >        my $outfileformat = 'raw';
> >                 my $i = 0;
> >
> >        my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
> >                                     '-format' => $infileformat);
> >
> >             my $seq_out = Bio::SeqIO->new('-file' => ">$outfile",
> >                                      '-format' => $outfileformat);
> >
> >                my $builder = $seq_in->sequence_builder();
> >
> >  $builder->want_none();
> >  $builder->add_wanted_slot('display_id','species');
> >
> >  while(my $seq = $seq_in->next_seq()) {
> >      $seq_out->write_seq($seq);
> >  }
> >
> >    exit;
> >
> > ----------------------------------------------------
> >
> > Anna
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >







More information about the Bioperl-l mailing list