[BioPython] comparing short sequences against genome

Mon Sep 27 20:30:00 EDT 2004

Citing Bzy Bee <nomy2020 at yahoo.com>:
<cut>
> What I have been unable to do so far is:
> 
> 1) the oligos (both forward and reverse) to iterate
> through the entire file, i.e. each and every sequence
> in the file (and keeping track of sequence names and
> positions when a match is found). At the moment it
> just takes the first 10 mers of sequence 1 (and 10
> mers at position 290) and compares these with sequence
> 1, but not with sequence 2 and 3 and so on
> 
> 2) secondly, I want to add one to the starting
> position of 10 mers, i.e. in the second round of
> iteration, instead of taking:
>      oligoF = result[0:10] 
> it should take result[1:11], i.e. increasing the
> position of 10 mer by 1 (and same for reverse oligo)
> and so on until teh sequence finishes. I'm not sure
> how to increment the result by 1.
> 
> I am kinda stuck at both these steps and any help
> would be very much appreciated.
<cut a lot>

Hi, 

I've attached modified version of your program that does what you want. It's not
clean and it's not even nearly as fast as it needs to be if you are thinking
about genomic scale experiments. 

If you need to just do this once and not for very long sequences, you can use
this. Otherwise, I would recommend reading something about suffix trees.
HOnestly I don't know if there's any code related to that in biopython. 

regards 

Bartek Wilczynski
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fasta_compare.py
Type: text/x-python
Size: 2219 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/biopython/attachments/20040928/af68a067/fasta_compare.py