[Bioperl-l] extracting ORGANISM line from genbank file
Hilmar Lapp
hlapp at gmx.net
Mon Aug 24 10:47:34 EDT 2009
Hi Anna,
sequence formats all have some varying amount of information that must
be present or otherwise the syntax is invalid. If what you need is a
two-column table of display_id and species name, then I would simply
write that, and not squeeze it into a standard sequence format.
(Unless you actually do want the sequence too, in which case you need
to add it as a wanted slot; even in that case though, writing a three-
column table might serve you better.)
-hilmar
On Aug 24, 2009, at 5:20 AM, Anna Kostikova wrote:
>
> Dear all,
>
> I am trying to extract species taxonomy from ORGANISM line. In fact
> I only need a first line under ORGANISM tag (e.i. genus + species).
> I though that it would be possible to do with the SeqBuilder object
> by stating
>
> $builder->add_wanted_slot('display_id','species');
>
> the problem is, however, that I've got an empty file as a result.
> What might be wrong with the script (see below)?
> Thanks a lot in advance for any ideas,
>
> -------------------------------------------
>
> #!/usr/bin/perl
> use strict;
> use Bio::SeqIO;
> use Bio::Seq::SeqBuilder;
>
> my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n";
> my $infile = shift or die $usage;
> my $infileformat = 'Genbank' ;
> my $outfile = shift or die $usage;
> my $outfileformat = 'raw';
> my $i = 0;
>
> my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
> '-format' => $infileformat);
>
> my $seq_out = Bio::SeqIO->new('-file' => ">$outfile",
> '-format' => $outfileformat);
>
> my $builder = $seq_in->sequence_builder();
>
> $builder->want_none();
> $builder->add_wanted_slot('display_id','species');
>
> while(my $seq = $seq_in->next_seq()) {
> $seq_out->write_seq($seq);
> }
>
> exit;
>
> ----------------------------------------------------
>
> Anna
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
More information about the Bioperl-l
mailing list