[Biopython] Motif search problem
Chris Mitchell
chris.mit7 at gmail.com
Thu Jul 11 20:00:33 UTC 2013
This is a non-Biopython code. But I frequently do searches against all of
nr proteins with this:
import re
#bottom 2 come from the same ordered list of tuples, like [(acc1, seq1),
(acc2, seq2)...]
proteins = '\n'.join([list of protein sequences])
indexes = [list of protein accessions]
sites = [match.start() for match in re.finditer('A[^P]NL', proteins)]
index = [indexes[proteins[:i].count('\n')] for i in sites]
It's amazing fast for substring searches instead of for loops.
On Thu, Jul 11, 2013 at 3:32 PM, Eric Ma <ericmajinglong at gmail.com> wrote:
> Hi everybody,
>
> We're having some problems doing a motif search.
>
> We'd like to search a set of 2000 amino acid sequences for a set of motifs.
> The motif set is A{P}NL, where {P} means "any amino acid but proline".
> We're trying to avoid manually creating every Seq() object containing every
> combination.
>
> We have tried AXNL, but that searches for any "AXNL" (literally) in the
> sequence, not a degenerate amino acid sequence.
>
> Sample code looks like the following:
>
> instances = [Seq("ANNL", IUPAC.extended_protein)] #<-- this is the line
> which is troublesome
> m = motifs.create(instances)
> #sequences is a list of lists, where each sublist looks like
> ['Accession(String)', 'Seq() Object']
> for record in sequences:
> for pos, seq in m.instances.search(record[1]):
> print record[0], pos, seq
>
> Does anybody have suggestions as to how we can go about modifying the
> "instances" line so that we don't have to type in every single combination?
>
> Cheers,
> Eric
> -----------------------------------------------------------------------
> Please consider the environment before printing this e-mail. Do you really
> need to print it?
>
> http://about.me/ericmjl
> _______________________________________________
> Biopython mailing list - Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
More information about the Biopython
mailing list