[Biopython] Support for Xdna, SnapGene and GCK formats
Damien Goutte-Gattat
dgouttegattat at incenp.org
Tue Jul 30 16:41:17 UTC 2019
Hi Biopython folks,
Last December I wrote to this mailing list [1] to present new parsers
for potential addition to Biopython's SeqIO module. That mail may have
gone unnoticed, so allow me to present those parsers again.
There were initially two parsers, one for the "Xdna" format (used by DNA
Strider and Serial Cloner [2]) and one for the SnapGene format (used by,
well, SnapGene [3]). Since then I added a third parser for the "GCK"
format (used by Gene Construction Kit [4]).
Those parsers are for now available in a Python module called
"incenp.binseqs". You can find the source code of that module on my
forge [5], the module can also be installed through PyPI (`pip install
incenp.binseqs`).
If you want to test them, after installing the module all have you to do
is load the `incenp.bio.seqio` package after loading Biopython's SeqIO,
then you may use SeqIO's standard API:
from Bio import SeqIO
import incenp.bio.seqio
xdna_record = SeqIO.read('serialcloner_file.xdna', 'xdna')
snap_record = SeqIO.read('snapgene_file.dna', 'snapgene')
gck_record = SeqIO.read('gck_file.gck', 'gck')
I wrote the SnapGene parser using a (partial) specification from the
editor; I had no such specifications for the Xdna and GCK formats, so I
wrote the corresponding parsers after reverse-engineering some sample
files I had. Obviously I can make no guarantees about their correctness.
I would like to propose those parsers for inclusion into Biopython.
There seems to be an interest [6], so I plan to change the namespace
from my own `incenp.bio.seqio` to Biopython's `Bio.SeqIO`, update the
license to match the current Biopython's licensing terms, and then
prepare a pull request against the latest code base.
Any comment, suggestion, criticism on that proposal is welcome.
Regards,
- Damien
[1]
https://lists.open-bio.org/pipermail/biopython/2018-December/016574.html
[2] http://serialbasics.free.fr/Serial_Cloner.html
[3] https://www.snapgene.com/
[4] http://www.textco.com/gene-construction-kit.php
[5] https://git.incenp.org/damien/binseqs
[6] https://twitter.com/pjacock/status/1155932488813797378
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20190730/192eaf2f/attachment.sig>
More information about the Biopython
mailing list