[Bioperl-l] genpept/swiss

Hilmar Lapp hlapp@gmx.net
Mon, 04 Sep 2000 10:00:15 +0200


Andrew Dalke wrote:
> 
> Hilmar Lapp <hlapp@gmx.net>:
> >Some of you may object to this
> 
> I'm one of those objectors.  If the format isn't right in one place,
> how certain are you that the recovery is correct and didn't skip
> important information?
> 

I'm not at all certain, and that's why it is still reported as something
you can turn into an exception programmatically. The point here is that
the SeqIO parsers are not meant as format validators: if you've got your
GenBank format writer wrong, taking the BioPerl parser as the judge for
this is probably not the right way. The main objective is I think to be
able to read sequence entries produced by someone you believe 'does it
right'.

Most people will not bother about possibly missing important information
for one of 1e5 sequences because that one has a misformatted tag.  They
just want to go through these sequences without having to add a single
line of code to the parser, which is specific to their current release of
whatever database, and will have to be re-tuned for the next one. 

In turn, together with what you said, this means that (most of?) the
BioPerl parsers in their current state are not suitable if you want your
program to cover any single piece of information covered in the database
(e.g. if you wished to convert GenBank into a relational format).

Am I missing something?

> 
> Second, there may be semantic differences between the two which are
> not intertranslatable.  That indicates a problem in the formats, in
> that they can't be used to specify everything they need to do.

It may not necessarily be an inability to specify something they'd need
to. It may be simply different semantics to express the same thing, and
the notions of feature tables may be an example, as may be the abuse of
structured comments.

> something, and 80% is often good enough.  Is is possible to make
> the bioperl code smarter so it knows how to deal with these different
> cases (other than just ignoring them)?  Is is possible to use
> bioperl to write a better GenPept in the first place?  Would you
> like to work on that code?  BTW, down that path lies many meetings

With lots of additional code it may indeed be possible, but I think it's
beyond the scope of BioPerl, and I'm not sure that there is a big enough
need in the user community that the scope of BioPerl be extended that
way. At least, there are so many things that deserve attention and code
before.

	Hilmar

-- 
-----------------------------------------------------------------
Hilmar Lapp                                email: hlapp@gmx.net
NFI Vienna, IFD/Bioinformatics             phone: +43 1 86634 631
A-1235 Vienna                                fax: +43 1 86634 727
-----------------------------------------------------------------