[EMBOSS] anarchistic results in einverted
Peter Rice
pmr at ebi.ac.uk
Thu Jun 23 09:03:43 UTC 2011
Hi Heikki,
On 06/22/11 13:53, Heikki Salavirta wrote:
> Hello,
>
> I've been running einverted queries with the same sequence, with the
> only difference in parameters being max repeat (e.g. 1000, 2000, 4000,
> 6000, 8000, 10000 & 20000). Other parameters are gap: 12, threshold: 50,
> match: 3 & mismatch: -4.
>
> I'd expect that this would result in an ever increasing number of
> results, as the max repeat parameter increases. However, this is not
> what I'm seeing.
>
> E.g. when max repeat is 2000, there are 3 results from the sequence
> prior to >10kb loci.
>
> It's the same result when max repeat is 4000.
>
> However, when max repeat is 6000, there are 2 results prior to >10kb
> loci, and 1 of them is not reported by 2000 & 4000 queries. In this
> particular result the gap between the inverted repeats is only 2
> nucleotides!
>
> When max repeat is 8000, there's only 1 result prior to >10kb loci,
> which is also reported by 2000, 4000 & 6000 queries.
>
> When max repeat is 1000, there are 4 results prior to >10kb loci.
>
> Could somebody perhaps explain these unexpected result, and perhaps
> suggest proper parameters for finding all inverted repeats from a >150kb
> sequence.
einverted was designed for the annotation of the Caenorhabditis elegans
genome. It deliberately does not find all inverted repeats.
The algorithm searches the max repeat region and reports the highest
scoring repeat.
It can also report other high scoring repeats in the region, but only if
they are not already covered by a result.
We have changed the way the overlapping scores in the max repeats region
are detected (there was a problem in the original program when repeat
traceback went over the end of the region) but it should not
significantly change the number of reported hits.
As to finding all inverted repeats ... I think that is not possible with
einverted because it uses a dynamic programming approach and will fail
to find overlapping repeats.
We would be interested in any other algorithms we could implement if
there are no open source applications available for them.
Hope this helps
Peter Rice
EMBOSS Team
More information about the EMBOSS
mailing list