[Bioperl-l] avoiding feature parsing

Danny Rice dwrice at indiana.edu
Fri Jun 10 01:41:46 EDT 2005


I'm cranking through a bunch of genbank or fasta files named by their 
ncbi gi.  The large genbank files take a huge amount of time to parse 
all the feature info but I am only interested in the sequence.  I've 
looked at the modules and read the docs but haven't found good 
documentation on how to read a genbank file without parsing all the 
feature info.  I tried

my $seqio = Bio::SeqIO->new(-file => "$dir/$gi", -format => "fasta");

and to my surprise it seems to parse the genbank files correctly but 
only gets the sequence, which seems to solve the problem.  My only 
question is "Is this the expected behavior and can I rely on this 
working?  And.  Is their any documentation on this behavior?".  I 
suppose this figures out that I mean: "I'm only interested in the 
sequence but go ahead and figure out the format of the input file if it 
isn't already in fasta format."  If there is a more standard or faster 
way to just get the sequence from a genbank file I'd be interested in 
that also.

-Danny


More information about the Bioperl-l mailing list