[Biopython-dev] SwissProt parsing inconsistency between Bio.SeqIO, Bio.SwissProt
Michiel de Hoon
mjldehoon at yahoo.com
Tue Apr 21 07:12:20 EDT 2009
Dear all,
I've noticed an inconsistency between how Bio.SeqIO and Bio.SwissProt parse DE (description) lines in SwissProt files.
For these DE lines:
DE RecName: Full=11S globulin seed storage protein 2;
DE AltName: Full=11S globulin seed storage protein II;
DE AltName: Full=Alpha-globulin;
DE Contains:
DE RecName: Full=11S globulin seed storage protein 2 acidic chain;
DE AltName: Full=11S globulin seed storage protein II acidic chain;
DE Contains:
DE RecName: Full=11S globulin seed storage protein 2 basic chain;
DE AltName: Full=11S globulin seed storage protein II basic chain;
DE Flags: Precursor;
a SwissProt record created by Bio.SwissProt contains the following:
>>> print swiss_record.description
RecName: Full=11S globulin seed storage protein 2;
AltName: Full=11S globulin seed storage protein II;
AltName: Full=Alpha-globulin;
Contains:
RecName: Full=11S globulin seed storage protein 2 acidic chain;
AltName: Full=11S globulin seed storage protein II acidic chain;
Contains:
RecName: Full=11S globulin seed storage protein 2 basic chain;
AltName: Full=11S globulin seed storage protein II basic chain;
Flags: Precursor;
but a SeqRecord returned by Bio.SeqIO contains this:
>>> print seq_record.description
RecName: Full=11S globulin seed storage protein 2;
AltName: Full=11S globulin seed storage protein II;
AltName: Full=Alpha-globulin;
Contains:
RecName: Full=11S globulin seed storage protein 2 acidic chain;
AltName: Full=11S globulin seed storage protein II acidic chain;
Contains:
RecName: Full=11S globulin seed storage protein 2 basic chain;
AltName: Full=11S globulin seed storage protein II basic chain;
Flags: Precursor;
So Bio.SeqIO removes the spaces in front of the line, but Bio.SwissProt doesn't.
For consistency, I think it's better to decide on one of these two styles.
My preference is for the approach used by Bio.SwissProt. Any objections to modifying the code used by Bio.SeqIO?
--Michiel.
More information about the Biopython-dev
mailing list