[Biopython-dev] Sequential SFF IO

Brad Chapman chapmanb at 50mail.com
Fri Jan 28 12:34:18 UTC 2011


Kevin and Peter;
I'm really enjoying this discussion -- thanks for talking this
through here.

> For just 5' barcode detection, I am using a memoized scheme that computes
> anchored alignments and then stores the result in a hash table
> (match/mismatch, edit distance).  This approach allows me to reject barcodes
> with too small an edit distance to the next best candidate.  It is
> reasonably fast for our fairly long 454 barcode set (10-'mers), though I do
> have an optional Cython version of the edit distance routine.  The
> pure-Python version is pretty zippy and can decode a 454 run in a minute or
> two.

This sounds like a nice approach. Do you have code available or is
it not packaged up yet?

I wrote up a barcode detector, remover and sorter for our Illumina
reads. There is nothing especially tricky in the implementation: it
looks for exact matches and then checks for approximate matches,
with gaps, using pairwise2:

https://github.com/chapmanb/bcbb/blob/master/nextgen/scripts/barcode_sort_trim.py

The "best_match" function could be replaced with different
implementations, using the rest of the script as scaffolding to do
all of the other sorting, trimming and output.

Brad



More information about the Biopython-dev mailing list