[Biopython] matching sequences from fasta files

Vincent Davis vincent at vincentdavis.net
Wed Mar 10 15:19:00 UTC 2010


I am considering just using just python and regular expression. Blast is
great but I don't seem to be able to easily filter it to get only close
matched that differ at 1 snp.
I have a custom microarray and a list of the sequences it will bind. I need
to test if they are in the genome of toxoplasma gondii (just yes or no) and
if there are close matches (differ at 1 snp) and where the diff is in the
sequence.

So from reading the responses I should consider python.re. or look more into
FASTA or needle. to see if i can get my version of a close match from them.
Is this right? Like I said I am very new to this, just got called in to get
this project done.

  *Vincent Davis
720-301-3003 *
vincent at vincentdavis.net
 my blog <http://vincentdavis.net> |
LinkedIn<http://www.linkedin.com/in/vincentdavis>


On Wed, Mar 10, 2010 at 6:00 AM, Peter <biopython at maubp.freeserve.co.uk>wrote:

> On Wed, Mar 10, 2010 at 11:15 AM, Ivan Rossi <ivan at biodec.com> wrote:
> > On Wed, 10 Mar 2010, Peter wrote:
> >
> >> For the special case of looking for perfect matches, you would be fine
> >> with just Python - depending on your data files, you may be able to
> >> match on the record identifiers
> >
> > Don't trust that. We have seen many many times the sequence change
> > over time (in different releases of the databases) while keeping the same
> id.
>
> Yes, be cautious about blindly matching on just the identifier.
> That's why I said "may" ;)
>
> > it is much more robust to compare SHA1 (or MD5) hashes of the
> > sequence, or do string comparisons.
>
> MD5 is known to have collisions, but Sebastián Bassi added support
> in Biopython for the GCG and SEGUID checksums, e.g. see:
>
> from Bio.SeqUtils.CheckSum import seguid
> help(seguid)
>
> SHA1 is used by SEGUID internally, taking care of the case.
>
> Peter
>
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>




More information about the Biopython mailing list