Bioperl: "lightweight analyses"

Steve Chervitz sac@neomorphic.com (Steve A. Chervitz)
Mon, 8 Mar 1999 11:54:02 -0800 (PST)


Andrew Dalke writes:
 >
 > #!/usr/local/bin/perl -pw
 > BEGIN {
 >    %color = ( "A" => "red", "T" => "yellow",
 >               "C" => "blue", "G" => "green");
 > }
 > s|([ATCG])|"<font color='$color{$1}'>$1</font>"|eg if !/^>/; 
 > 
 > which colorizes DNA sequence in FASTA format for HTML, based on
 > residue name.  Writing it caused me to go on a (small) perl jag :)
 > 
 > Hmm, a better s// might be:
 > 
 > s!(A+|T+|C+|G+)!"<font color='" . $color{substr($1, 0, 1)} .
 >                 "'>$1</font>"!eg if !/^>/; 
 > 
 > which keeps from having duplicate color changes in a row.

Nice. With just a little bit more, you can support uppercase and
lowercase sequences, count the number of sequences, and properly
handle newlines: 

#!/usr/local/bin/perl -pw
BEGIN {
   %color = ( "A" => "red", "T" => "yellow",
              "C" => "blue", "G" => "green");
   $count = 0;
   print "<pre>";
}
END {
   print "<\pre>";
   print STDERR "$count sequences processed.\n";
}
s!(A+|T+|C+|G+)!"<font color='" . $color{substr("\u$1", 0, 1)} .
                "'>$1</font>"!egi if not(/^>/ and ++$count); 

Note that you need to use ++$count and not $count++. 
Extra credit for a regexp that properly handles newlines without
resorting to the use of <pre></pre> (though I kind of prefer looking
at monospaced sequences).

SteveC
sac@neomorphic.com
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================