[Bioperl-l] Fix for fasta loading into bioperl-db

Hilmar Lapp hlapp@gnf.org
Wed, 12 Jun 2002 09:27:03 -0700

> -----Original Message-----
> From: Elia Stupka [mailto:elia@fugu-sg.org]
> Sent: Wednesday, June 12, 2002 12:43 AM
> To: Bioperl
> Subject: [Bioperl-l] Fix for fasta loading into bioperl-db
> It turns out that loading simple fasta files into bioperl-db 
> hadn't really
> been checked so far.
> I made a few changes to make it work. The first one is 
> non-controversial,
> which is that a fasta parsed seq does not return a RichSeqI object and
> thus does not have a seq_version. I simply set seq_version to 
> zero if the
> object is not a RichSeqI compliant object.
> The second one is trickier, and the fix is temporary. All fasta parsed
> sequences come back with accession unknown, they just have a 
> display_id
> and a description.
> Possible solutions:
> 1)Decide in bioperl-live that when parsing fasta files, the
> accession_number is set to the display_id (this kind of makes sense
> because if you load a genbank file and dump it as fasta, the accession
> number gets put as display_id at the beginning of the 
> header). I didn't go
> ahead because I wanted to hear what people thought before touching our
> sacred parsers
> 2)In bioperl-db, when trying to store a sequence that has accession
> unknown, change the accession to the display_id, this is my 
> temporary fix

IMO this is the right one. Bioperl (the semantics I mean) should not be driven by bioperl-db nor biosql (they may reveal bugs though). This should happen in adaptors.

> 3)Change the actual sql of bioperl-db constraint. This one is not easy
> because there is no easy way to tell mysql to put the constraint on
> accession, version and division, OR (if accession unknown) to 
> display_id,
> version, division. So from the SQL point of view all we could 
> do is remove
> the constraint and trust on checking the constraint only in the code

I'd vote against that.

There is no easy solution for guessing missing attributes.

Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757