[Biopython] Find Sub-sequence with Variable positions

Jurgens de Bruin debruinjj at gmail.com
Tue Jul 9 01:34:26 UTC 2013


Thanks for all the suggestion both will work perfect!!


On 8 July 2013 17:37, Ivan Gregoretti <ivangreg at gmail.com> wrote:

> This is a way of doing it with Biopython's pairwise2.
>
> from Bio import pairwise2
>
> # set the parameters
> reward    =   5
> penalty   =  -4
> gapopen   = -30
> gapextend = -10
>
>
> # specify the sequence (query) and the pattern (subject)
> query = 'GTCGCGACGTTCGTACGTCGCGA'
> subject = 'ACGTACGTACGT'
>
> # run the pairwise aligner
> qseq,sseq,score,start,end = pairwise2.align.localms(query ,subject,
> reward, penalty, gapopen, gapextend)[0]
>
> # see the aligned query sequence
> qseq
> 'GTCGCGACGTTCGTACGTCGCGA'
>
> # see the aligned subject sequence
> sseq
> '------ACGTACGTACGT-----'
>
> # see score, start and end positions.
> score
> 51.0
>
> start
> 6
>
> end
> 18
>
> You can also BLAST 2 sequences from within Python if you need speed.
>
> Hope this helps,
>
> Ivan
>
>
>
>
>
> Ivan Gregoretti, PhD
>
>
>
>
>
>
> On Mon, Jul 8, 2013 at 10:06 AM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> > On Mon, Jul 8, 2013 at 2:19 PM, Jurgens de Bruin <debruinjj at gmail.com>
> wrote:
> >> Hi,
> >>
> >> I hope someone can help me with the following:
> >>
> >> I want to find a sub-sequence within a sequence,but the catch is that
> the
> >> sub-sequence contains positions that are variable and does not have to
> >> match 100%.
> >> For example:
> >> if the following is the sub-sequence all the postions have to match but
> >> position 5(A) can be any of the 4 bases ( ACGT ) within the query-seq.
> >> ACGTACGTACGT
> >>
> >> Thanks!!!
> >
> > You could use a regular expression to do that - in Python, or at the
> > command line with something like EMBOSS dreg or fuzzynuc:
> >
> > http://emboss.open-bio.org/rel/rel6/apps/dreg.html
> > http://emboss.open-bio.org/rel/rel6/apps/fuzznuc.html
> >
> > Peter
> > _______________________________________________
> > Biopython mailing list  -  Biopython at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython
>



-- 
Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/
distinti saluti/siong/duì yú/привет

Jurgens de Bruin




More information about the Biopython mailing list