[Bioperl-l] Reading sequences without parsing them

Karger, Amir AKarger@CuraGen.com
Mon, 16 Jul 2001 11:19:26 -0400


> -----Original Message-----
> From: Ewan Birney [mailto:birney@ebi.ac.uk]
> 
> On Mon, 16 Jul 2001, Karger, Amir wrote:
> 
> There is not an in built way to do this inside Bioperl nicely.

That's what I thought. Bummer.
 
> options
> 
>    (a) use IO::String but that will be dependent on the bioperl write_seq
> differences - ie, this is not what you want as when we change bioperl
> write_seq for a format you will think all your sequences have updates

Exactly. Of course I would double check and test those pieces of the entry
that I think are important and see whether they've changed. But still
annoying.
 
>    (b) trust the in built accession.version system for sequences not
> annotations

You mean RichSeq->sequence_version?

By the way, swiss.pm looks for a line starting "SV" for sequence version
(Bio/SeqIO/swiss.pm line 195 in bioperl-live), but the current (May 2000)
Swiss-Prot user manual says there is no SV line for Swiss-Prot.
 
>    (c) trust the Date line for annotation updates (available 
> in swissprot, embl , genbank)

I was thinking of doing that. Although the swiss-prot docs mention that they
don't update the "last annotation update" field for "global changes",
whatever that means. (But it looks like those would change fingerprints
anyway.)
 
> I could imagine a complex system like:

[snip]

>    # auto update complies to the implict SeqIO interface of next_seq
>    # but only gives back new MD5 entries
> 
>    while( (my $updated_entry = $auto->next_seq()) ) {
>       # do something with updated
>    }

So are you saying that entries that are already in the database with the
same MD5 get ignored, and for others, the string is passed to
SeqIO::next_seq (which loads swiss.pm since you told it the format above)?
That would indeed be cool.
  
> the MD5 is probably best implemented as a DBM file.

Yup. Although in my case I'm already using DBI to put data into an Oracle
database, so the fingerprints will just be one more piece of info for each
entry.

> If you wrote something like this that would be great!

It would be, but I wouldn't bet on its happening any time soon :(

Amir Karger
Curagen Corporation