[Bioperl-l] The Feature changes which have broken compatibility

Allen Day allenday at ucla.edu
Wed Feb 2 17:21:36 EST 2005


> I'm not sure how many people are aware how this works - here is what
> happens under the hood:
> 
> 1) Bio::FeatureIO::gff is initialized. This entails establishing a
> connection to sourceforge to download the sequence ontology. This of
> course will not work if you are offline. Even if you are online, it
> doesn't seem to work for me and is of course dependent on the vagaries of
> whether the sourceforge & the sourceforge mirror is working. Even if this
> all works, there is an initial start-up lag which may be unacceptable to
> some applications. Also, not everyone using bioperl is in a country with
> fast internet access and local-ish sourceforge mirror.
> 
> In addition, it hardcodes metadata about the ontology in bioperl (see
> Bio::Ontology::DocumentRegistry) which is asking for trouble.
> 
> In addition, it downloads the ontology in a legacy deprecated format,
> because that's all bioperl currently supports. Also asking for trouble
> further down the line
> 
> Why is it doing all this? Purely in order to check that the feature types
> provided in the GFF file are valid SOFA terms. Look, I already know all
> the GFF3 files I want to parse have valid SOFA types. If I want to check,
> I'll do this myself thanks, I don't want bioperl to secretly do it for me
> in a hokey way that requires me being online and in the USA, every single
> time I parse a file.
> 
> In fact, there is already a script for validating a GFF3 file, in the SO
> software repository (which uses Bio::Tools::GFF) which does a much more
> thorough job, checking feature parentage too.
> 
> What happened to modularity?  You know, parsing in a parser, verification
> in a verifier.
> 
> 2) it starts parsing features, assigning Bio::Ontology::Term objects to
> each feature (the type). This entails having Graph::Directed, which is
> what Jason is alluding to. Not that bad in itself, but unneccessary for
> the majority of apps that just want to parse GFF
> 
> Is it just me that thinks this is madness? Can someone please make it
> stop?

Correct, but this behavior is disabled by default.  From the 
FeatureIO/gff.pm POD:

  my $featureOut = Bio::FeatureIO->new(-format => 'gff',
    -version => 3,
    -fh => \*STDOUT,
    -validate_terms => 1, #boolean. validate ontology
                          #terms online?  default 0 (false).
  );

If you don't turn this on, it merely creates a
Bio::Annotation::OntologyTerm object with the identifer or term name from
the GFF file -- no validation attempted.

Furthermore, if you do want to validate against the SO/SOFA ontologies,
but you don't want to rely on the live ontologies on the web, you can
parse SO/SOFA from local files (in deprecated format, admittedly, but this
isn't my doing) first.  That fills the Bio::Ontology cache so network
queries don't happen.

-Allen


More information about the Bioperl-l mailing list