[Biopython-dev] SwissProt parsing inconsistency between Bio.SeqIO, Bio.SwissProt

Michiel de Hoon mjldehoon at yahoo.com
Tue Apr 21 07:12:20 EDT 2009


Dear all,

I've noticed an inconsistency between how Bio.SeqIO and Bio.SwissProt parse DE (description) lines in SwissProt files.

For these DE lines:

DE   RecName: Full=11S globulin seed storage protein 2;
DE   AltName: Full=11S globulin seed storage protein II;
DE   AltName: Full=Alpha-globulin;
DE   Contains:
DE     RecName: Full=11S globulin seed storage protein 2 acidic chain;
DE     AltName: Full=11S globulin seed storage protein II acidic chain;
DE   Contains:
DE     RecName: Full=11S globulin seed storage protein 2 basic chain;
DE     AltName: Full=11S globulin seed storage protein II basic chain;
DE   Flags: Precursor;

a SwissProt record created by Bio.SwissProt contains the following:
>>> print swiss_record.description
RecName: Full=11S globulin seed storage protein 2;
AltName: Full=11S globulin seed storage protein II;
AltName: Full=Alpha-globulin;
Contains:
  RecName: Full=11S globulin seed storage protein 2 acidic chain;
  AltName: Full=11S globulin seed storage protein II acidic chain;
Contains:
  RecName: Full=11S globulin seed storage protein 2 basic chain;
  AltName: Full=11S globulin seed storage protein II basic chain;
Flags: Precursor;

but a SeqRecord returned by Bio.SeqIO contains this:

>>> print seq_record.description
RecName: Full=11S globulin seed storage protein 2;
AltName: Full=11S globulin seed storage protein II;
AltName: Full=Alpha-globulin;
Contains:
RecName: Full=11S globulin seed storage protein 2 acidic chain;
AltName: Full=11S globulin seed storage protein II acidic chain;
Contains:
RecName: Full=11S globulin seed storage protein 2 basic chain;
AltName: Full=11S globulin seed storage protein II basic chain;
Flags: Precursor;

So Bio.SeqIO removes the spaces in front of the line, but Bio.SwissProt doesn't.
For consistency, I think it's better to decide on one of these two styles.
My preference is for the approach used by Bio.SwissProt. Any objections to modifying the code used by Bio.SeqIO?

--Michiel.


      


More information about the Biopython-dev mailing list