[Bioperl-l] How to Handle Parse Errors

dmcwilli dmcwilli at utk.edu
Fri Jul 4 10:28:44 EDT 2003


There was a question like this in May, I think, but I have been unable
to find help for this in the FAQ or recent postings.

I am trying to parse GenBank records and find those which have the
Feature /region_name="Transit peptide".  I did a broad Entrez search
and downloaded the results, so I'm accessing the file locally.  The
parser fails and exits the script prematurely when it encounters a record
with the Feature "Het" with the message:

-------------------- WARNING --------------------- 
MSG: exception while parsing location line
[join(bond(201),bond(203),bond(204),bond(204),bond(204),bond(204))] in
reading EMBL/GenBank/SwissProt, ignoring feature Het (seqid=8RUC_G):
------------- EXCEPTION ------------- 
MSG: operator "bond" unrecognized by parser STACK
Bio::Factory::FTLocationFactory::from_string
/usr/lib/perl5/site_perl/5.8.0/Bio/Factory/FTLocationFactory.pm:160
STACK Bio::Factory::FTLocationFactory::from_string
/usr/lib/perl5/site_perl/5.8.0/Bio/Factory/FTLocationFactory.pm:157
STACK (eval) /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/FTHelper.pm:124
STACK Bio::SeqIO::FTHelper::_generic_seqfeature
/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/FTHelper.pm:123 STACK
Bio::SeqIO::genbank::next_seq
/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/genbank.pm:396 STACK toplevel
./biopl5.pl:20
--------------------------------------
---------------------------------------------------
Can't call method "primary_tag" on an undefined value at
/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/genbank.pm line 400, <GEN0>
line 23630.
# end of message

My code is:

#!/usr/bin/perl
#
# tpfilter.pl
# Get transit peptides from files in genbank format.  Uses BioPerl
# David R. McWilliams dmcwilli at utk.edu
# 04-Jul-03

use strict;
use warnings ;
use Bio::SeqIO;
use Bio::Seq;

my $file = shift @ARGV;
my $in = new Bio::SeqIO(-format => 'genbank', -file => $file);

my $datetime = scalar(localtime()) ;
print "# Output of $0 on $file.\n" ;
print "# $datetime\n" ;

my $fnd = 0 ;
while( my $seq = $in-> next_seq) {
    foreach my $feature ( $seq->get_SeqFeatures ) {
 	if($feature->primary_tag eq 'Region' ) {
 	    if( $feature->has_tag('region_name') ) {
 		my ($tag) = $feature->get_tag_values('region_name') ;
 		if( $tag =~ /transit|signal/i ) {
		    $fnd++ ;
		    print ">", $seq->display_id(), "|",
		          "tp=", $feature->start, "\.\.", $feature->end, "|",
  		          $seq->species->binomial(), "|",
		          $seq->description(), "\n";
		    print $seq->subseq($feature->start, $feature->end), "\n" ;
		}
 	    }
 	}
    }
}
print "# Found $fnd seqs w/ tp.\n" ;
		      
# end code

If I remove the offending records by hand, this works fine.  So, is
there a way to continue to parse the offending records, even though
the parser does not recognize this particular feature, or is there a
way to catch the error and skip the record without aborting the rest
of the script?

Regards,
	      



More information about the Bioperl-l mailing list