[Biopython] Support for Xdna, SnapGene and GCK formats
Damien Goutte-Gattat
dgouttegattat at incenp.org
Tue Jul 30 18:06:04 UTC 2019
[Resending this mail, as a broken DKIM signature may have led
DMARC-compliant spam filters to outright reject the message. Apologies
to those who did receive it at the first attempt.]
Hi Biopython folks,
Last December I wrote to this mailing list [1] to present new parsers for
potential addition to Biopython's SeqIO module. That mail may have gone
unnoticed, so allow me to present those parsers again.
There were initially two parsers, one for the "Xdna" format (used by DNA
Strider and Serial Cloner [2]) and one for the SnapGene format (used by, well,
SnapGene [3]). Since then I added a third parser for the "GCK" format (used by
Gene Construction Kit [4]).
Those parsers are for now available in a Python module called
"incenp.binseqs". You can find the source code of that module on my forge [5],
the module can also be installed through PyPI (`pip install incenp.binseqs`).
If you want to test them, after installing the module all have you to do is
load the `incenp.bio.seqio` package after loading Biopython's SeqIO, then you
may use SeqIO's standard API:
from Bio import SeqIO
import incenp.bio.seqio
xdna_record = SeqIO.read('serialcloner_file.xdna', 'xdna')
snap_record = SeqIO.read('snapgene_file.dna', 'snapgene')
gck_record = SeqIO.read('gck_file.gck', 'gck')
I wrote the SnapGene parser using a (partial) specification from the editor; I
had no such specifications for the Xdna and GCK formats, so I wrote the
corresponding parsers after reverse-engineering some sample files I had.
Obviously I can make no guarantees about their correctness.
I would like to propose those parsers for inclusion into Biopython. There
seems to be an interest [6], so I plan to change the namespace from my own
`incenp.bio.seqio` to Biopython's `Bio.SeqIO`, update the license to match the
current Biopython's licensing terms, and then prepare a pull request against
the latest code base.
Any comment, suggestion, criticism on that proposal is welcome.
Regards,
- Damien
[1] https://lists.open-bio.org/pipermail/biopython/2018-December/016574.html
[2] http://serialbasics.free.fr/Serial_Cloner.html
[3] https://www.snapgene.com/
[4] http://www.textco.com/gene-construction-kit.php
[5] https://git.incenp.org/damien/binseqs
[6] https://twitter.com/pjacock/status/1155932488813797378
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20190730/b34243f5/attachment.sig>
More information about the Biopython
mailing list