[Biopython] format fasta files to genbank: problem with too long Locus identifier
Björn Johansson
bjorn_johansson at bio.uminho.pt
Sat Jul 17 07:32:12 EDT 2010
Hi all, this is an example of parsing a fasta file and then trying to
convert it to genbank.
It seems that the fasta header file is not split between the "|", and all
that is in the fasta header ends up as "LOCUS" in the genbank file. Is this
the expected behavior? Can this be set somehow?
Thanks for any help on this!
/bjorn
>>> from Bio import SeqIO
>>> a=SeqIO.read("newfile.fasta", "fasta")
>>> a
SeqRecord(seq=Seq('CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGATGAGACCGTGG...GGG',
SingleLetterAlphabet()), id='gi|2765658|emb|Z78533.1|CIZ78533',
name='gi|2765658|emb|Z78533.1|CIZ78533',
description='gi|2765658|emb|Z78533.1|CIZ78533 C.irapeanum 5.8S rRNA gene and
ITS1 and ITS2 DNA', dbxrefs=[])
>>> a.format('fasta')
'>gi|2765658|emb|Z78533.1|CIZ78533 C.irapeanum 5.8S rRNA gene and ITS1 and
ITS2
DNA\nCGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGATGAGACCGTGGAATAAA\nCGATCGAGTGAATCCGGAGGACCGGTGTACTCAGCTCACCGGGGGCATTGCTCCCGTGGT\nGACCCTGATTTGTTGTTGGG\n'
>>> a.format('genbank')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.6/dist-packages/Bio/SeqRecord.py", line 638,
in format
return self.__format__(format)
File "/usr/local/lib/python2.6/dist-packages/Bio/SeqRecord.py", line 652,
in __format__
SeqIO.write([self], handle, format_spec)
File "/usr/local/lib/python2.6/dist-packages/Bio/SeqIO/__init__.py", line
398, in write
count = writer_class(handle).write_file(sequences)
File "/usr/local/lib/python2.6/dist-packages/Bio/SeqIO/Interfaces.py",
line 271, in write_file
count = self.write_records(records)
File "/usr/local/lib/python2.6/dist-packages/Bio/SeqIO/Interfaces.py",
line 256, in write_records
self.write_record(record)
File "/usr/local/lib/python2.6/dist-packages/Bio/SeqIO/InsdcIO.py", line
628, in write_record
self._write_the_first_line(record)
File "/usr/local/lib/python2.6/dist-packages/Bio/SeqIO/InsdcIO.py", line
453, in _write_the_first_line
raise ValueError("Locus identifier %s is too long" % repr(locus))
ValueError: Locus identifier 'gi|2765658|emb|Z78533.1|CIZ78533' is too long
>>>
More information about the Biopython
mailing list