[BioPython] comparing short sequences against genome
hjm at tacgi.com
Fri Sep 24 14:02:38 EDT 2004
Not biopython per se, but I was just playing around with a short
script that does exactly this for oligos from length 2-10. Currently
it emits xml for ggobi input, but it would be easy to reformat those
statements if you just wanted raw numeric output.
It's not a shift/add core so it's not particularly fast (~18s to
analyze 8 mers on 140K) and it doesn't deal with degenerate nucs and
it's clunky in a number of other ways as well, but you're welcome to
plink at it. Because of the way it does bookkeeping, it's memory
limited. # seqs x size of hash x # hits, so for large word sizes
(>10 or so, it's not well-suited). You'd have to make up your own
linked-list-like storage for longer words, I think.
OT - I used Numeric.reshape to allocate elements, but afaik, this only
allows rectangular arrays. Does python have an easy method for
reallocating memory like a ***array in C to handle sparse /
anyway, email if interested.
On Monday 20 September 2004 7:51 pm, Bzy Bee wrote:
> Hi everyone
> I want to design 15-20 primers for a differential
> display experiment on a bacterial genome. The idea is
> take say 10-15 mer of sequence (from the genome) and
> compare it against the rest to see how many times it
> occurs in the genome, followed by next 10 mer and so
> Is there anything in biopython that could help me in
> doing this?
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> BioPython mailing list - BioPython at biopython.org
Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com
<<plain text preferred>>
More information about the BioPython