[Biopython] matching sequences from fasta files
Chris Fields
cjfields at illinois.edu
Wed Mar 10 14:31:39 UTC 2010
On Mar 10, 2010, at 5:15 AM, Ivan Rossi wrote:
> On Wed, 10 Mar 2010, Peter wrote:
>
>> For the special case of looking for perfect matches, you would be fine
>> with just Python - depending on your data files, you may be able to
>> match on the record identifiers
>
> Don't trust that. We have seen many many times the sequence change over time (in different releases of the databases) while keeping the same id.
If the database has a proper versioning scheme or date information this should be detectable, otherwise I agree.
> it is much more robust to compare SHA1 (or MD5) hashes of the sequence, or do string comparisons.
Agreed there; it's probably the only full-proof way.
>> or simply do string comparisons of the sequences.
>
> This is OK.
>
> --
> Ivan Rossi, PhD - ivan AT biodec dot com OR ivan dot rossi3 AT unibo dot it
> BioDec Srl, Via Calzavecchio 20/2, I-40033 Casalecchio di Reno (BO), Italy
> Phone: (+39)-051-0548263 - Fax: (+39)-051-7459582 - http://www.biodec.com
chris (peeking in from bioperl ;)
More information about the Biopython
mailing list