[Biopython-dev] Writing protein records in Swiss-Prot format

Adam Sjøgren asjo at koldfront.dk
Tue Oct 10 22:16:08 UTC 2017


  Hi,


In BioPerl I can create a Swiss-Prot record like this:

  #!/usr/bin/perl

  use Bio::Seq::RichSeq;
  use Bio::SeqFeature::Generic;
  use Bio::SeqIO;

  my $record=Bio::Seq::RichSeq->new(-id=>'TST01', -accession_number=>'Test', -division=>'Reviewed', -seq=>'FLY');
  $record->add_date('2017-10-10');
  $record->add_SeqFeature(Bio::SeqFeature::Generic->new(-start=>2, -end=>3, -primary=>'CDS'));
  Bio::SeqIO->new(-format=>'Swiss')->write_seq($record);

Running it produces:

  $ ./write_swissprot.pl
  ID   TST01                   Reviewed;           3 AA.
  AC   Test;
  DT   2017-10-10, integrated into UniProtKB/Swiss-Prot.
  FT   CDS           2      3
  SQ   SEQUENCE     3 AA;  441 MW;  7B5729A000000000 CRC64;
       FLY
  //
  $ 

as expected.

The BioPython wiki page on SeqIO has a file formats table that shows
that reading Swiss-Prot format is supported, but writing is not:

 · http://biopython.org/wiki/SeqIO#file-formats

Trying it out confirms this; e.g. running:

  #!/usr/bin/python3

  from Bio.Seq import Seq
  from Bio.Alphabet import IUPAC
  from Bio.SeqRecord import SeqRecord
  from Bio.SeqFeature import SeqFeature, FeatureLocation

  record = SeqRecord(id='TST01', name='Test', seq=Seq('FLY', IUPAC.protein))
  record.features.append(SeqFeature(FeatureLocation(1, 3), type='CDS'))
  print(record.format('swiss'))

outputs:

  $ ./write_swissprot.py
  Traceback (most recent call last):
    File "./write_swissprot.py", line 10, in <module>
      print(record.format('swiss'))
    File "/usr/lib/python3/dist-packages/Bio/SeqRecord.py", line 672, in format
      return self.__format__(format)
    File "/usr/lib/python3/dist-packages/Bio/SeqRecord.py", line 696, in __format__
      SeqIO.write(self, handle, format_spec)
    File "/usr/lib/python3/dist-packages/Bio/SeqIO/__init__.py", line 496, in write
      % format)
  ValueError: Reading format 'swiss' is supported, but not writing
  $

and if I change 'swiss' to 'embl', I get a record out, only not in the
format I wanted.

My stupid question is: how come this isn't supported? Nobody had the
need? Nobody wrote the code? Something else?


  Best regards,

    Adam

-- 
 "It will turn into pointer equality or something             Adam Sjøgren
  ghastly like that"                                     asjo at koldfront.dk



More information about the Biopython-dev mailing list