[Biopython-dev] PIR parsing

Mon Dec 11 00:30:52 EST 2000

Andrew,

>   What is the point of having both the "ref" and "dat" format
> in PIR.
[snip]
> As far as I can tell, the ref format is easier to machine parse
> than the dat one, and is more compact.  The dat format is easier
> for a human to scan.  Also, the dat format contains the sequence
> information while the ref one does not.
>
> Can anyone here provide to me some background?

seq is usually derived from dat so that blast databases (or anything
else that requires fasta formatted sequences) can be made. I understand
that ref is a trimmed down dat without sequence data so you can save some
space by not keeping the partially redundant dat. I don't know for sure,
but the more compact format might be another measure along those lines.

Perhaps, though they're competing with the OWL database for the
 most obfuscated database format ;)

Cheers,
Edwin.
-------------------------------------------------------------------------------
Edwin Steele
QA Manager, eBioinformatics.             http://www.ebioinformatics.com
email: edwin.steele at eBioinformatics.com  Bay 16/104, Australian Technology Park
ph: +61 (2) 9209-4765                    Eveleigh 1430, NSW, Australia.