FW: [Bioperl-l] tr/// and eval

John Kloss jkloss@sapiens.wustl.edu
Thu, 10 Oct 2002 10:31:57 -0700


I just wanted to throw my solution out there.  It only trapses through
the sequence once (well, not strictly, because the di-mere causes us to
look at n-1 residues twice, but the engine only bumps along once), and
uses the regex engine for capture as opposed to function calls which is
usually much faster.

#!/usr/local/bin/perl

my $ecoli = "ACGGCGATTATTTGGGGGCGGCAATATG";
my %count = ( );

pos( $ecoli ) = 0;	# just to be safe
while ( $ecoli =~ m/(.)/g ) {

        my $tmp = $1;
        $count{  $tmp    }++;					# single
residue
        $count{ "$tmp$1" }++ if $ecoli =~ m/\G(.)/;	# di-mere
}

foreach (keys %count) {
        print "$_ occurred $count{$_} times\n";
}


	John Kloss <jkloss@sapiens.wustl.edu>
	Systems Admin., Database Admin., Programmer.

	Gish Lab, Genome Sequencing Center
	Washington University Medical School ... in St. Louis


-----Original Message-----
From: bioperl-l-admin@bioperl.org [mailto:bioperl-l-admin@bioperl.org]
On Behalf Of simon andrews (BI)
Sent: Thursday, October 10, 2002 7:06 AM
To: bioperl-l@bioperl.org
Subject: RE: [Bioperl-l] tr/// and eval



> -----Original Message-----
> From: Pat Schloss [mailto:pds@plantpath.wisc.edu]
> Sent: Thursday, October 10, 2002 1:34 PM
> To: bioperl-l@bioperl.org
> Subject: [Bioperl-l] tr/// and eval
> 
> 
> Hi all, 
> 
> I'm trying to write a program with the simple task of counting the
> number of A's, T's, G's, and C's (and binary combinations of those
> letters, i.e. AA, AG, AC...) in a DNA sequence string. 
> 
> What I'd like to do pretty much expressed in the code below.  The
> problem I'm having is that when I do the tr/// I'm trying to 
> interpolate into the search a different letter of my modified
alphabet.  
>
> ${$_} = ($ecoli=~ tr/$_//); 

You have two problems here.  First is the one you spotted which is that
tr doesn't interpolate.  For the purpose you described before it is also
the wrong tool to use.  tr operates on single characters, and you said
you wanted to also spot dinucleotides, so in your case it's better to
use the m// construct instead (the m is actually optional).

The other problem is the ${$_} construct.  This is called a symbolic
reference - and is a bad thing.  There's (nearly) always a better way to
do what you want than using a symref.  For the full details of why,
consult the Perl FAQ

perldoc -q 'How can I use a variable as a variable name?'

In your case you'd be better off storing the count in a hash - so try
the following code....

#!/usr/bin/perl -w
use strict;

my $ecoli = 'ACGGCGATTATTTGGGGCGGCAATATG';

my %counts;

foreach (qw(A T C G AA GG CC TT)){
  $counts{$_} = () = $ecoli=~/$_/g;
}

foreach (sort keys(%counts)){
  print "$_ occurred $counts{$_} times\n";
}
_______________________________________________
Bioperl-l mailing list
Bioperl-l@bioperl.org
http://bioperl.org/mailman/listinfo/bioperl-l