[Bioperl-l] Allowing One error in Sequence matching

Smithies, Russell Russell.Smithies at agresearch.co.nz
Wed Sep 16 23:06:45 UTC 2009


How about chunk it into overlapping words, skip if >2 N, then regex?

$seq = "CGATCGNATGNCGTCTAGCTGACANGTTGACTCTAGCTGATCGATCGATCGTACGTANNCGTAGTCGTACNTACGATCTNACGCACGNATGCTACGTACG";

$motif = "ACGT";
foreach (split //, $motif) {$w .= "[${_}N]"}

foreach ($seq =~ /(?=(\w{4}))/g){
  next if tr/N/N/ >= 2;
  print "$_\n" if  eval "/$w/" ;
}



> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Abhishek Pratap
> Sent: Thursday, 17 September 2009 9:42 a.m.
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Allowing One error in Sequence matching
> 
> Hi All
> 
> I am not able to think of smart way to do sequence matching allowing
> userdefined number of mismatches.
> 
> For eg:
> 
> Given Sequence : AGCT will be considered a match to reference if any
> one base pair position #(1,2,3,4)  has a mismatch that is  [ACGTN] so
> the possible matches could be
> 
> This is for position 1.
> AGCT
> GGCT
> CGCT
> TGCT
> NGCT
> and likewise for each position.
> 
> any nice regular expression. One way that I could think was to
> generate all the possible tags for a given sequence and then do the
> matching. It will be a computationally expensive for long dataset .
> Any neat method ?
> 
> Thanks,
> -Abhi
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================




More information about the Bioperl-l mailing list