[Biopython] Motif search problem

Eric Ma ericmajinglong at gmail.com
Thu Jul 11 19:32:41 UTC 2013


Hi everybody,

We're having some problems doing a motif search.

We'd like to search a set of 2000 amino acid sequences for a set of motifs.
The motif set is A{P}NL, where {P} means "any amino acid but proline".
We're trying to avoid manually creating every Seq() object containing every
combination.

We have tried AXNL, but that searches for any "AXNL" (literally) in the
sequence, not a degenerate amino acid sequence.

Sample code looks like the following:

instances = [Seq("ANNL", IUPAC.extended_protein)] #<-- this is the line
which is troublesome
m = motifs.create(instances)
#sequences is a list of lists, where each sublist looks like
['Accession(String)', 'Seq() Object']
for record in sequences:
    for pos, seq in m.instances.search(record[1]):
        print record[0], pos, seq

Does anybody have suggestions as to how we can go about modifying the
"instances" line so that we don't have to type in every single combination?

Cheers,
Eric
-----------------------------------------------------------------------
Please consider the environment before printing this e-mail. Do you really
need to print it?

http://about.me/ericmjl



More information about the Biopython mailing list