[Biopython] next-gen sequencing software
Jose Blanca
jblanca at btc.upv.es
Fri Jul 24 08:53:15 UTC 2009
Hi:
We have been writting some code that we think that could be interesting to the
Biopython community. Right now we're mainly interested in the new sequencing
technologies, specially in:
- cleaning of the raw reads provided by the sequencers.
- parsing of the assembler results (ace, caf and bowtie map files)
- SNP detecion and mining.
- sequence annotation.
We're writing some software to deal with that problems. Currently the software
is not finished but it starts to be useful. Everything is written in python.
We have used Biopython for some things, but for some others we have used a
slighty different approach. If the Biopython developers think that some of
our ideas could be of any use we would be willing to incorporate it into
Biopython.
If you want to take a look just go to:
http://bioinf.comav.upv.es/svn/biolib/biolib/src/
Recently we have finished the cleaning infrastructure. We haven't yet
pipelines defined for all the new sequencing technologies but we have created
a pipeline system very easy to modify. With just a dozen of lines of code a
new pipeline suited to a new sequencing technology can be created. There's
also an script that runs those pipelines (run_cleannig_pipeline.py).
We have also created a set of scripts that create statistics that ease the
quality evaluation of the cleaning process.
Regarding the SNPs we can get them using ace and caf files and we're finishing
the parsing of the bowtie map files. All these files are transformed into an
iterator of contig objects. There is also funcionallity to get SNPs and
statistics from these contig objects.
We're willing to get comments, suggestions, criticisms.
Best regards,
--
Jose M. Blanca
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)
P.D. We're using this functionallity in a computer cluster, so everything is
parallelized.
More information about the Biopython
mailing list