[Biopython] format fasta files to genbank: problem with too long Locus identifier

Björn Johansson bjorn_johansson at bio.uminho.pt
Sat Jul 17 11:32:12 UTC 2010


Hi all, this is an example of parsing a fasta file and then trying to
convert it to genbank.
It seems that the fasta header file is not split between the "|", and all
that is in the fasta header ends up as "LOCUS" in the genbank file. Is this
the expected behavior? Can this be set somehow?

Thanks for any help on this!
/bjorn



>>> from Bio import SeqIO
>>> a=SeqIO.read("newfile.fasta", "fasta")
>>> a
SeqRecord(seq=Seq('CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGATGAGACCGTGG...GGG',
SingleLetterAlphabet()), id='gi|2765658|emb|Z78533.1|CIZ78533',
name='gi|2765658|emb|Z78533.1|CIZ78533',
description='gi|2765658|emb|Z78533.1|CIZ78533 C.irapeanum 5.8S rRNA gene and
ITS1 and ITS2 DNA', dbxrefs=[])
>>> a.format('fasta')
'>gi|2765658|emb|Z78533.1|CIZ78533 C.irapeanum 5.8S rRNA gene and ITS1 and
ITS2
DNA\nCGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGATGAGACCGTGGAATAAA\nCGATCGAGTGAATCCGGAGGACCGGTGTACTCAGCTCACCGGGGGCATTGCTCCCGTGGT\nGACCCTGATTTGTTGTTGGG\n'
>>> a.format('genbank')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.6/dist-packages/Bio/SeqRecord.py", line 638,
in format
    return self.__format__(format)
  File "/usr/local/lib/python2.6/dist-packages/Bio/SeqRecord.py", line 652,
in __format__
    SeqIO.write([self], handle, format_spec)
  File "/usr/local/lib/python2.6/dist-packages/Bio/SeqIO/__init__.py", line
398, in write
    count = writer_class(handle).write_file(sequences)
  File "/usr/local/lib/python2.6/dist-packages/Bio/SeqIO/Interfaces.py",
line 271, in write_file
    count = self.write_records(records)
  File "/usr/local/lib/python2.6/dist-packages/Bio/SeqIO/Interfaces.py",
line 256, in write_records
    self.write_record(record)
  File "/usr/local/lib/python2.6/dist-packages/Bio/SeqIO/InsdcIO.py", line
628, in write_record
    self._write_the_first_line(record)
  File "/usr/local/lib/python2.6/dist-packages/Bio/SeqIO/InsdcIO.py", line
453, in _write_the_first_line
    raise ValueError("Locus identifier %s is too long" % repr(locus))
ValueError: Locus identifier 'gi|2765658|emb|Z78533.1|CIZ78533' is too long
>>>



More information about the Biopython mailing list