[Biopython] matching sequences from fasta files

Giovanni Marco Dall'Olio dalloliogm at gmail.com
Wed Mar 10 17:27:50 UTC 2010


On Wed, Mar 10, 2010 at 4:19 PM, Vincent Davis <vincent at vincentdavis.net> wrote:
> I am considering just using just python and regular expression. Blast is
> great but I don't seem to be able to easily filter it to get only close
> matched that differ at 1 snp.

I am not sure I followed all the discussion in this topic, but if you
to find sequences that differ for one or two positions and you don't
need to do it in any explicit biological context, you may look for
algorithms that do fuzzy matching like agrep.

One example may be this module:
- http://www.personal.psu.edu/iua1/libs/apse.html
which as you can read is outdated and probably won't work properly,
but it is based on a C library which may have been implemented in
other python modules.
I would look for this and also do a google/yahoo/anyother search for
'string fuzzy matching python' or similar, I am sure you can find a
lot of literature and modules about that.
If you are comfortable with the unix shell, you may be probably be
able to implement all your pipeline with some emboss tool to read the
sequences and agrep for the matching.

Anyway, I didn't understand your use case very well, and I am sure
that if you look better on the Internet you can find some tool that
does this already without having to write a new script and test it. If
you do look for that it would be better, for you and for the people
who will read your papers.




-- 
Giovanni Dall'Olio, phd student
Department of Biologia Evolutiva at CEXS-UPF (Barcelona, Spain)

My blog on bioinformatics: http://bioinfoblog.it



More information about the Biopython mailing list