[Biopython] Problems with reading Swiss format records (swissprot specific date fields)

Jan T Kim jttkim at googlemail.com
Mon Mar 4 15:40:07 UTC 2013


Dear All,

trying to parse the attached Swissprot record gives me a stack trace:

    Traceback (most recent call last):
      File "./swisstest", line 7, in <module>
	e = Bio.SeqIO.read(sys.argv[1], 'swiss')
      File "/usr/lib/pymodules/python2.7/Bio/SeqIO/__init__.py", line 599, in read
	first = iterator.next()
      File "/usr/lib/pymodules/python2.7/Bio/SeqIO/__init__.py", line 537, in parse
	for r in i:
      File "/usr/lib/pymodules/python2.7/Bio/SeqIO/SwissIO.py", line 97, in SwissIterator
	annotations['date'] = swiss_record.created[0]
    TypeError: 'NoneType' object has no attribute '__getitem__'

The problem is at line 99 (rather than 97)of
https://github.com/biopython/biopython/blob/master/Bio/SeqIO/SwissIO.py :

    annotations['date'] = swiss_record.created[0]

without an "if swiss_record.created is not None" test or something
similar. The parse function of Bio.SwissProt initialises the created
instance variable to None, and only if a "DT" record containing the
string "INTEGRATED" (case insensitive) is found, created is set to that
date.

The same kind of problem occurs with the sequence_update variable in the
next statement:

    annotations['date_last_sequence_update'] = swiss_record.sequence_update[0]

Would it be sensible to set the 'date' and 'date_last_sequence_update'
entries of the annotations dictionary only if the values are actually
found in the swiss_record? I understand that with a genuine SwissProt
record, they should always be there, but this happened to me when working
on files generated from the refseq protein database using the EMBOSS
seqret program with -osformat=swiss, which doesn't seem like an entirely
exotic use case to me.

Best regards, Jan
-- 
 +- Jan T. Kim -------------------------------------------------------+
 |             email: jttkim at gmail.com                                |
 |             WWW:   http://www.jtkim.dreamhosters.com/              |
 *-----=<  hierarchical systems are for files, not for humans  >=-----*
-------------- next part --------------
ID   ZP_10312765             Reviewed;         498 AA.
AC   ZP_10312765;
DT   27-JUN-2012, entry version 1.
DE   hypothetical protein FraQA3DRAFT_6339 [Frankia sp. QA3].
OS   Frankia sp. QA3.
RN   [1]
RP   1-498
RN   [2]
RP   1-498
KW   .
FT   REGION        1    498       Frankia sp. QA3. QA3. taxon:710111.
FT   REGION        1    498       hypothetical protein. 53620.
FT   REGION        1    498       FraQA3DRAFT_6339.
FT                                complement(NZ_CM001489.1:7362098..7363594
FT                                ). 11.
SQ   SEQUENCE   498 AA;  53751 MW;  39E328894991F8AC CRC64;
     mhphrvhpsr vhpspehpsp ehlsrehqsr prhataaara arsrpprphr agrrarrddr
     crqrsqraac lpggcpttcr dgrraptdrg hgshapgrgp taavpdlavp agcagpgrgg
     vgarhrrpaa artapgsqpt aaarrstags rvprgpgrrr sattrrgrrr prdalaarpa
     pvrvsvhgps grgpgrarrr pcrirgrchh dapggratap avggaprlvh rcggrrwqra
     rpgrggrdgp amptprssvp epgppgprhp rgpsrrpahp hwnptlggrr wpgvhrrdgr
     hgahrrrtip rpagrptrgr sgphrpapvr paagrhagng rcrpdhgrir rqppdagpas
     rsahthrgsr rlrrpggrps grrsdartgl arrsaagadq twpaprrwrh rrtnhrgrgs
     apgrhrsaap ptvpvphpar srpphdhgsg hprthrpgpt ghhaggrrpa rapghaagag
     rrrtapmrra rslclpsp
//


More information about the Biopython mailing list