[Biopython] matching sequences from fasta files

Ivan Rossi ivan at biodec.com
Wed Mar 10 11:15:38 UTC 2010


On Wed, 10 Mar 2010, Peter wrote:

> For the special case of looking for perfect matches, you would be fine
> with just Python - depending on your data files, you may be able to
> match on the record identifiers

Don't trust that. We have seen many many times the sequence change over 
time (in different releases of the databases) while keeping the same id.

it is much more robust to compare SHA1 (or MD5) hashes of the sequence, or 
do string comparisons.

> or simply do string comparisons of the sequences.

This is OK.

--
Ivan Rossi, PhD - ivan AT biodec dot com OR ivan dot rossi3 AT unibo dot it
BioDec Srl, Via Calzavecchio 20/2, I-40033 Casalecchio di Reno (BO), Italy
Phone: (+39)-051-0548263 - Fax: (+39)-051-7459582 - http://www.biodec.com



More information about the Biopython mailing list