[Bioperl-l] genpept/swiss

hilmar.lapp@pharma.Novartis.com hilmar.lapp@pharma.Novartis.com
Mon, 4 Sep 2000 17:18:10 +0100




This describes exactly my situation in which I have to read in data in
all sorts of different formats (and people's interpretations of these
formats).
The problem now is that BioPerl throws a warning if a sequence does not
comply 100% with the standards and exits. While at that moment I want to

     You mean it throws an exception. (Issuing a warning shouldn't cause an
     exit.)

be able to say that he can ignore the warning if (e.g.) he has read the
sequence correctly.

     Does this sound like a call for a callback a client program can
     provide? The question then is what should be passed to the callback
     routine? The sequence object as it has been constructed so far? Sounds
     fragile, and may be useless in many cases. The complete offending
     source record? Would discard the parse done so far (for the callback),
     and would require a partial rewrite of the parsers because they read
     line-by-line (at least most if not all of the rich format parsers).

Something that would be really nice to have is a more modular approach
in which it would be easy to say:  'this data is in a format which is
EMBL, with the following quirks, additional fields, ... '.

     Yes. But this needs a careful design of how can you split up the parse
     of a sequence record into subtasks that are a) fairly independent (and
     can thus be overridden by your QuirkyEMBL parser), and b) common to
     all (rich) formats. Anyone's done any work in this direction so far?


          Hilmar