[Biopython] matching sequences from fasta files
Ivan Rossi
ivan at biodec.com
Wed Mar 10 11:15:38 UTC 2010
On Wed, 10 Mar 2010, Peter wrote:
> For the special case of looking for perfect matches, you would be fine
> with just Python - depending on your data files, you may be able to
> match on the record identifiers
Don't trust that. We have seen many many times the sequence change over
time (in different releases of the databases) while keeping the same id.
it is much more robust to compare SHA1 (or MD5) hashes of the sequence, or
do string comparisons.
> or simply do string comparisons of the sequences.
This is OK.
--
Ivan Rossi, PhD - ivan AT biodec dot com OR ivan dot rossi3 AT unibo dot it
BioDec Srl, Via Calzavecchio 20/2, I-40033 Casalecchio di Reno (BO), Italy
Phone: (+39)-051-0548263 - Fax: (+39)-051-7459582 - http://www.biodec.com
More information about the Biopython
mailing list