[EMBOSS] anarchistic results in einverted

Thu Jun 23 09:03:43 UTC 2011

Hi Heikki,

On 06/22/11 13:53, Heikki Salavirta wrote:
> Hello,
>
> I've been running einverted queries with the same sequence, with the
> only difference in parameters being max repeat (e.g. 1000, 2000, 4000,
> 6000, 8000, 10000 & 20000). Other parameters are gap: 12, threshold: 50,
> match: 3 & mismatch: -4.
>
> I'd expect that this would result in an ever increasing number of
> results, as the max repeat parameter increases. However, this is not
> what I'm seeing.
>
> E.g. when max repeat is 2000, there are 3 results from the sequence
> prior to >10kb loci.
>
> It's the same result when max repeat is 4000.
>
> However, when max repeat is 6000, there are 2 results prior to >10kb
> loci, and 1 of them is not reported by 2000 & 4000 queries. In this
> particular result the gap between the inverted repeats is only 2
> nucleotides!
>
> When max repeat is 8000, there's only 1 result prior to >10kb loci,
> which is also reported by 2000, 4000 & 6000 queries.
>
> When max repeat is 1000, there are 4 results prior to >10kb loci.
>
> Could somebody perhaps explain these unexpected result, and perhaps
> suggest proper parameters for finding all inverted repeats from a >150kb
> sequence.

einverted was designed for the annotation of the Caenorhabditis elegans 
genome. It deliberately does not find all inverted repeats.

The algorithm searches the max repeat region and reports the highest 
scoring repeat.

It can also report other high scoring repeats in the region, but only if 
they are not already covered by a result.

We have changed the way the overlapping scores in the max repeats region 
are detected (there was a problem in the original program when repeat 
traceback went over the end of the region) but it should not 
significantly change the number of reported hits.

As to finding all inverted repeats ... I think that is not possible with 
einverted because it uses a dynamic programming approach and will fail 
to find overlapping repeats.

We would be interested in any other algorithms we could implement if 
there are no open source applications available for them.

Hope this helps

Peter Rice
EMBOSS Team