[Bioperl-l] Is bio-perl right?

Ewan Birney birney@ebi.ac.uk
Fri, 29 Sep 2000 08:21:03 +0100 (GMT)


On 28 Sep 2000, John S. J. Anderson wrote:

> Greetings --
> 
> I'm trying to decide if bio-perl provides the tools I need to do
> something, or if I'm better off rolling my own custom solution.
> 
> Basically, I want to retrieve a large number (hundreds) of sequence
> files from Genbank (via Entrez, so there are a number of potential
> formats) and then parse each file according to the header
> information. I need to split the sequence in each file into coding and
> non-coding, and I would like to map each segment back onto a genome
> (probably by tracking location relative to ORF starts and stops).
> 
> I know there's been some traffic on the list recently about the
> difficulty of sufficiently generalizing the GenBank format via a
> bio-perl parser, but I haven't played around with the code at all. (To
> cross with another thread, the documentation (and lack of time) has
> been the biggest barrier to my picking up bio-perl.)
> 

I think bioperl should get you 75% of the way there:

   Picking up hundreas of sequence files - ok

   Parsing GenBank (latest 0.6.2 candidate release is the one to go for)
ok.

   Then the basic loop is going to go something like


   # script for looping over genbank entries, printing out
   # start-end of CDS exons


   use Bio::SeqIO;
   use Bio::Seq; # don't really need this, because Bio::SeqIO uses it

   $seqio = Bio::SeqIO->new('-format' => 'GenBank', -fh =>
                           \*INPUT_STREAM);
 
   while( $seq = $seqio->next_seq ) {
      foreach $feat ( $seq->top_SeqFeatures ) {
          if( $feat->primary_tag eq 'CDS_span' ) {
             # features is a CDS line with a join statement
             foreach $sub ( $feat->sub_SeqFeature ) {
                print "start ",$sub->start," ",$sub->end,"\n";
                # do what you like
             }
          } elsif ( $feat->primary_tag eq 'CDS' ) {
             # feature is a CDS line without a join statement
             # yes - this part is potentially badly designed in bioperl!
             print "start ",$feature->start," end ",$feature->end,"\n";
          }
       }
   }

 


> So, is bio-perl the Right Thing for this job, or should I look into
> developing my own stuff?
> 

I would hope that Bioperl is "the right thing".

Give it a whirl and i'd be interested to hear about your experiences. Feel
free to edit the Wiki docs directly about your experiences as well at

http://bio.perl.org/wiki/html/BioPerl/FrontPage.html

(choose perhaps BioperlGettingStarted and then just click "edit page" and
you are away).


I will add this mini-script to the wiki docs myself... ;)




> Thanks for any advice,
> john.
> 
> -- 
> ------------------------------------------------------------------------
> John S J Anderson                                           NCBI,NLM,NIH
> IRTA Fellow                                              Bldg 38A, B2N14  
> janderso@ncbi.nlm.nih.gov                                   301.594.6087
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------