[Biopython] matching sequences from fasta files

Leighton Pritchard lpritc at scri.ac.uk
Wed Mar 10 15:53:45 UTC 2010


Hi,

On 10/03/2010 Wednesday, March 10, 03:46, "Vincent Davis"
<vincent at vincentdavis.net> wrote:

> I need to check if any/all the sequence from one fasta file are in another.
> Looking through the docs I think I could do this.

As others have pointed out, a simple string comparison will do this.
 
> I then what to find "close matches" and for me this means they differ by 1
> snp and I need to know the location of this differing snp. How would I do
> this?

There are many ways in which this *could* be done.  You probably want one
that is quite quick, though <grin>

If I never needed to do this again, I would probably run BLAST or FASTA (or
my favourite search algorithm, running ungapped) using one set of sequences
as a query, and the other as the target database, using the program
parameters  to report only one match each time.  I'd then use Python to
parse the results, throwing away all those matches where

i) if the number of aligned bases is the same as the number of bases in the
query: the number of match identities differs from the number of aligned
bases by more than one
ii) if the number of aligned bases differs from the number of bases in the
query by exactly one: the number of match identities differs from the number
of aligned bases
iii) the number of aligned bases differs from the number of bases in the
query by two or more

The remainder should be your set of (almost) full-length 1/0 SNP matches,
and there should be enough data in your search program output to identify
the location of the SNP.

I think it would be faster to use something off-the-shelf like BLAST and
parse the output, than to write something to do the search.  It will
probably work quicker, too.

Lots of ways to do this repeatably, including writing a generator function.

I hope this is useful,

L.

-- 
Dr Leighton Pritchard MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:lpritc at scri.ac.uk       w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405


______________________________________________________
SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries.  This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed.  It may not be disclosed or used by any other than that
addressee.
If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
______________________________________________________



More information about the Biopython mailing list