[Bioperl-l] Reading sequences without parsing them

Andrew Dalke dalke@dalkescientific.com
Mon, 16 Jul 2001 17:09:32 +0100


Elia:
>I wonder if I am on the right track, but doesn't it sound very much like
>this problem would benefit very much from the biopython solution, where
>the parser only parses some parts speeding up the finding of updated ones?

Yes, it does.  You can tell the biopython parser to generate
events for the sequence and feature blocks and get all the text from
those areas to generate your fingerprint.

The usefulness of this approach depends on how many trivial
changes occur in the database record.  Suppose there are none.
>From my timings with SWISS-PROT, reading only a few records (needed
for FASTA) is about 60% faster than reading all the records needed
for the full object model.  You'll need to do two passes over
the record, so
  time to check = 0.4 T
  time to check then parse = 1.4 T

x * 0.4 + (1-x) * (1 + 0.4) == 1 when x == 40%, so you only get
a win if fewer than 40% of the records have changed.

                    Andrew
                    dalke@dalkescientific.com