[Bioperl-l] K-mer generating script

Cook, Malcolm MEC at stowers.org
Tue Jan 6 09:26:07 EST 2009


oh... and... if you need the results in a perl array, and you're running under some unix, try the even terser:

#!/usr/bin/env perl
my $k = shift;
my @kmer = split / /, `echo @{['{A,T,G,C}' x $k]}`;


--Malcolm Cook

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> Cook, Malcolm
> Sent: Monday, January 05, 2009 1:15 PM
> To: 'Chris Fields'; 'Jason Stajich'
> Cc: 'bioperl list'; 'Mark A. Jensen'; Blanchette, Marco
> Subject: Re: [Bioperl-l] K-mer generating script
>
> Gang,
>
> I couldn't resist adding the following non-perl solution...
>
> #!/bin/bash
> k=$1
> s=$( printf "%${k}s" ); # a string with $k blanks
> s=${s// /{A,T,G,C\}};   # substitute '{A,T,G,C}' for each of
> the k blanks
> echo 'kmers using bash to expand:' $s > /dev/stderr
> bash -c "echo  $s";     # let brace expanion of inferior bash
> compute the cross product
>
> -- Malcolm
>
>
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris
> > Fields
> > Sent: Friday, December 19, 2008 11:54 PM
> > To: Jason Stajich
> > Cc: bioperl list; Mark A. Jensen; Blanchette, Marco
> > Subject: Re: [Bioperl-l] K-mer generating script
> >
> > To add to the pile:
> >
> > Mark-Jason Dominus tackles this problem in Higher-Order Perl using
> > iterators, which also allows other nifty bits like 'give
> variants of
> > A(CTG)T(TGA)', where anything in parentheses are
> wild-cards.  The nice
> > advantage of the iterator approach is you don't tank memory
> for long
> > strings.
> > Furthermore, as a bonus, you can now download the book for
> > free:
> >
> > http://hop.perl.plover.com/book/
> >
> > The relevant chapter is here (p. 135):
> >
> > http://hop.perl.plover.com/book/pdf/04Iterators.pdf
> >
> > chris
> >
> > On Dec 19, 2008, at 11:02 PM, Jason Stajich wrote:
> >
> > > Does someone want to put this on the wiki too?
> > >
> > > Maybe we could start a little bit of perl snippets for
> > examples like
> > > these.
> > >
> > > -j
> > > On Dec 19, 2008, at 7:45 PM, Mark A. Jensen wrote:
> > >
> > >> A little sloppy, but it recurses and is general---
> > >>
> > >> # ex...
> > >> @combs = doit(3, [ qw( A T G C ) ]); 1; # code
> > >>
> > >> sub doit {
> > >>  my ($n, $sym) = @_;
> > >>  my $a = [];
> > >>  doit_guts($n, $sym, $a, '');
> > >>  return map {$_ || ()} @$a;
> > >> }
> > >>
> > >> sub doit_guts {
> > >> my ($n, $sym, $store, $str)  = @_;
> > >> if (!$n) {
> > >>  return $str;
> > >> }
> > >> else {
> > >>  foreach my $s (@$sym) {
> > >>    push @$store, doit_guts($n-1, $sym, $store, $str.$s);  } } }
> > >>
> > >>
> > >> ----- Original Message ----- From: "Blanchette, Marco"
> > >> <MAB at stowers-institute.org
> > >> >
> > >> To: <bioperl-l at lists.open-bio.org>
> > >> Sent: Friday, December 19, 2008 6:25 PM
> > >> Subject: [Bioperl-l] K-mer generating script
> > >>
> > >>
> > >>> Dear all,
> > >>>
> > >>> Does anyone have a little function that I could use to
> > generate all
> > >>> possible k-mer DNA sequences? For instance all possible
> > 3-mer (AAA,
> > >>> AAT, AAC, AAG, etc...). I need something that I could input the
> > >>> value of k and get all possible sequences...
> > >>>
> > >>> I know that it's a problem that need to use recursive
> programming
> > >>> but I can't get my brain around the problem.
> > >>>
> > >>> Many thanks
> > >>>
> > >>> Marco
> > >>> --
> > >>> Marco Blanchette, Ph.D.
> > >>> Assistant Investigator
> > >>> Stowers Institute for Medical Research 1000 East 50th St.
> > >>>
> > >>> Kansas City, MO 64110
> > >>>
> > >>> Tel: 816-926-4071
> > >>> Cell: 816-726-8419
> > >>> Fax: 816-926-2018
> > >>>
> > >>> _______________________________________________
> > >>> Bioperl-l mailing list
> > >>> Bioperl-l at lists.open-bio.org
> > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > Jason Stajich
> > > jason at bioperl.org
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



More information about the Bioperl-l mailing list