[Bioperl-l] Pattern recognition

Chris Hemmerich chemmeri at cgb.indiana.edu
Fri Oct 19 13:40:22 UTC 2007



Djoja,

Perhaps this is a problem with scope. Since you declare $position outside 
of

         while (my $seq_obj = $in->next_seq) {

it will not be reset for each sequence.  The line

 		$position = $position + length($seq_obj) - $patlen + 1;

will continue to increase $position for each sequence processed. From a 
quick look, I don't think you need $position at all in this script and can 
just print $-[0]+1.

Hope this helps,

  Chris


On Fri, 19 Oct 2007, Djodja wrote:

>
> Hi all,
>
> This is my first post. I am very puzzeled about a script that i wrote last
> year, that was working fine, and all of sudden, with no change to the code
> it just stopped doing what it was supposed to do. Basically I'm searching
> for patterns on a whole genome database.
>
> The pattern that I'm searching for at the moment is AACAAAG.
>
> My script finds the pattern, no problem on regexps or anything. It also
> searches the whole genome divided in fasta headers for each gene. The
> problem is that I need to know the position of the pattern, relative to the
> start of each gene. This is where the script goes weird. It doesn't reset
> the counter to the start of each gene. It instead continues to count
> patterns as if the genes aren't separated at all. Also I need the
> information on how many repetitions of the pattern there are in each gene,
> as above, it does not reset the counting of repetitions and instead gives me
> the repitions in the whole genome.
>
> here is the code:
>
> #!/usr/lib/perl -w
>
> use Bio::Perl;
> use Bio::Seq ();
> use Bio::SeqIO ();
> use Bio::Tools::SeqPattern ();
> use strict;
> use warnings;
>
>
>
>
> print 'Where is the file that you want to analise?';
> chomp (my $Fasfile = <>) or die "No such file";
> print "Where do you want to store the rough file?";
> chomp (my $rough = <>);
> print "Where do you want to store the final file?";
> chomp (my $final = <>);
>
> my ($pat, $patlen, $count_of_pat, $position) = ("AACAAAG", 7, 0, 0);
> my $pattern = new Bio::Tools::SeqPattern(-SEQ => $pat, -TYPE =>'Dna');
>
> if(my $in = Bio::SeqIO->new(-file => $Fasfile, -format => 'fasta' )) {
>   	open (MOTIFCOUNT,">> $rough") or die "The gene list wasn't created";
>
>
>        while (my $seq_obj = $in->next_seq) {
>
>                #my $id1 = $seq_obj->display_id();
> 				my $seq_length = length($seq_obj);
> 				#print "$seq_length\n";
> 				foreach ($seq_obj->seq =~ m/$pat/g){
> 								++$count_of_pat;
>
> 								my $id1 = $seq_obj->display_id();
> 								print $seq_obj->seq, "\n";
> 								print MOTIFCOUNT "$id1\t$count_of_pat\tposition ",
> 								$position + $-[0] + 1, "\n";
> 								$position = $position + length($seq_obj) - $patlen + 1;
> 								#print "$id1\n";
> 								#print "$count_of_pat\n";
> 								#print "$position\n";
>
> 								}
>
> 					#print "$seq_length\n";
> 					print $seq_obj->seq, "\n";
> 					}
>
>
>        close MOTIFCOUNT;
>
>        open (SYM, "+< $rough") or die "The file doesn't exist";
>
>        open (TAB, ">> $final") or die "The gene list wasn't transformed";
>
> 	print TAB "Motif: $pat\n";
>        while (my $line = <SYM>){
>
>        	foreach ($line){
>                	$line =~ s/\|/\t/g;
>        		print TAB "$line";
>        	}
>        }
>        close (SYM);
>        close(TAB);
>
>        print "The gene list was created and transformed successfully.\n\a"
> }
> exit;
>
>
>
> Can anyone help me out? This is kind of in the way of my PhD progression.
>
> All the best,
>
> Djodja
> -- 
> View this message in context: http://www.nabble.com/Pattern-recognition-tf4652923.html#a13293737
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



More information about the Bioperl-l mailing list