[BioPython] comparing short sequences against genome

Harry Mangalam hjm at tacgi.com
Fri Sep 24 14:02:38 EDT 2004

Not biopython per se, but I was just playing around with a short 
script that does exactly this for oligos from length 2-10.  Currently 
it emits xml for ggobi input, but it would be easy to reformat those 
statements if you just wanted raw numeric output.

It's not a shift/add core so it's not particularly fast (~18s to 
analyze 8 mers on 140K) and it doesn't deal with degenerate nucs and 
it's clunky in a number of other ways as well, but you're welcome to 
plink at it.  Because of the way it does bookkeeping, it's memory 
limited.  # seqs x size of hash x # hits, so for large word sizes 
(>10 or so, it's not well-suited).  You'd have to make up your own 
linked-list-like storage for longer words, I think.

OT - I used Numeric.reshape to allocate elements, but afaik, this only 
allows rectangular arrays.  Does python have an easy method for 
reallocating memory like a ***array in C to handle sparse / 
non-rectangular arrays?  

anyway, email if interested.


On Monday 20 September 2004 7:51 pm, Bzy Bee wrote:
> Hi everyone
> I want to design 15-20 primers for a differential
> display experiment on a bacterial genome. The idea is
> take say 10-15 mer of sequence (from the genome) and
> compare it against the rest to see how many times it
> occurs in the genome, followed by next 10 mer and so
> on.
> Is there anything in biopython that could help me in
> doing this?
> thanks
> JA
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> BioPython mailing list  -  BioPython at biopython.org
> http://biopython.org/mailman/listinfo/biopython

Cheers, Harry
Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com 
            <<plain text preferred>>

More information about the BioPython mailing list