[Biopython-dev] PIR parsing

Andrew Dalke dalke at acm.org
Sat Dec 9 01:55:08 EST 2000


Me:
>I've written a much more complete PIR CODATA parser which works with
>the latest PIR release (Release 66.00, September 30, 2000).  I tested
>it against pir1.dat and pir3.dat.

I'm testing it against pir2.dat, which is 394,221,543 bytes
uncompressed and 174,756 records.  I'm doing the run on
the bioperl.org machine since it has more disk space available
than my laptop.  The parser parses about 3 or 4 records per
second (sshd takes 1/2 the CPU!).

I've processed 15% of the records and found only two problems
in my parser.  Both are my fault because I made too strong an
assumption of the format.

BTW, the format definition at 
   http://pir.georgetown.edu/pirwww/otherinfo/doc/co2.pdf
is wrong in many of the details - probably because it is
6 years old.

                    Andrew





More information about the Biopython-dev mailing list