[Bioperl-l] extracting ORGANISM line from genbank file

Anna Kostikova geoeco at rambler.ru
Tue Aug 25 07:09:43 UTC 2009


hello Hilmar,

Thanks for your comments.
Actually, my final aim is to get 2 files: first one is a fasta file with 
all the sequences, and the seconds one is simply a list of species names
extracted from the same Genbank file. So that's why I though it would be 
a good thing to put all together into one script with bioperl objects.
Is there a better way to do it? the reason, why I don't want a simple 
parsing for species names is that i also want to be able to which gene 
has been sequenced

while (my $inseq = $seq_in->next_seq) {
			 if ($inseq->desc =~ m/5\.8S ribosomal RNA/) {
				 $seq_out->write_seq($inseq);
			 }
         }


and only it is 5.8s rRNA I want to extract the species name and a 
sequences. And I thought that with direct parsing it would be much 
longer code.
Am I wrong?

i am a newbie both in bioperl and bioinformatics, so all comments would 
be appreciated:)

Anna


* Hilmar Lapp <hlapp at gmx.net> [Mon, 24 Aug 2009 10:47:34 -0400]:
> Hi Anna,
>
> sequence formats all have some varying amount of information that must
> be present or otherwise the syntax is invalid. If what you need is a
> two-column table of display_id and species name, then I would simply
> write that, and not squeeze it into a standard sequence format.
> (Unless you actually do want the sequence too, in which case you need
> to add it as a wanted slot; even in that case though, writing a three-
> column table might serve you better.)
>
> 	-hilmar
>
> On Aug 24, 2009, at 5:20 AM, Anna Kostikova wrote:
>
> >
> > Dear all,
> >
> > I am trying to extract species taxonomy from ORGANISM line. In fact
> > I only need a first line under ORGANISM tag (e.i. genus + species).
> > I though that it would be possible to do with the SeqBuilder object
> > by stating
> >
> > $builder->add_wanted_slot('display_id','species');
> >
> > the problem is, however, that I've got an empty file as a result.
> > What might be wrong with the script (see below)?
> > Thanks a lot in advance for any ideas,
> >
> > -------------------------------------------
> >
> > #!/usr/bin/perl
> > use strict;
> > use Bio::SeqIO;
> > use Bio::Seq::SeqBuilder;
> >
> > my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n";
> >        my $infile = shift or die $usage;
> >        my $infileformat = 'Genbank' ;
> >        my $outfile = shift or die $usage;
> >        my $outfileformat = 'raw';
> > 		 my $i = 0;
> >
> >        my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
> >                                     '-format' => $infileformat);
> >
> > 	     my $seq_out = Bio::SeqIO->new('-file' => ">$outfile",
> >                                      '-format' => $outfileformat);
> >
> > 		my $builder = $seq_in->sequence_builder();
> >
> >  $builder->want_none();
> >  $builder->add_wanted_slot('display_id','species');
> >
> >  while(my $seq = $seq_in->next_seq()) {
> >      $seq_out->write_seq($seq);
> >  }
> >
> >    exit;
> >
> > ----------------------------------------------------
> >
> > Anna
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>





More information about the Bioperl-l mailing list