[Biopython-dev] PIR parsing
Edwin Steele
edwin.steele at eBioinformatics.com
Mon Dec 11 00:30:52 EST 2000
Andrew,
> What is the point of having both the "ref" and "dat" format
> in PIR.
[snip]
> As far as I can tell, the ref format is easier to machine parse
> than the dat one, and is more compact. The dat format is easier
> for a human to scan. Also, the dat format contains the sequence
> information while the ref one does not.
>
> Can anyone here provide to me some background?
seq is usually derived from dat so that blast databases (or anything
else that requires fasta formatted sequences) can be made. I understand
that ref is a trimmed down dat without sequence data so you can save some
space by not keeping the partially redundant dat. I don't know for sure,
but the more compact format might be another measure along those lines.
Perhaps, though they're competing with the OWL database for the
most obfuscated database format ;)
Cheers,
Edwin.
-------------------------------------------------------------------------------
Edwin Steele
QA Manager, eBioinformatics. http://www.ebioinformatics.com
email: edwin.steele at eBioinformatics.com Bay 16/104, Australian Technology Park
ph: +61 (2) 9209-4765 Eveleigh 1430, NSW, Australia.
More information about the Biopython-dev
mailing list