[Biojava-l] BioJava on large sequences
Thomas Down
td2@sanger.ac.uk
Sun, 23 Sep 2001 19:18:14 +0100
On Fri, Sep 21, 2001 at 11:45:29PM +0100, David Huen wrote:
> On Fri, 21 Sep 2001, Cox, Greg wrote:
>
> > Has anyone done work with BioJava on very large sequences (i.e. contigs)?
> > The types of issues we're thinking about are keeping a sub-set of the
> > sequence in memory, but ensuring that the indices of the bases are accurate.
> > Has anyone dealt with this?
> >
>
> I hope I've understood the question correctly but I think the BioJava DAS
> client does this sort of thing.
The DAS code does indeed handle fetch-on-demand, and reasonably
intellingent caching. If a sequence is an assembly of small
components (clones or whatever), fetching happens by clone. Large
sequences (chromosomes where no assembly information is available)
get split into `tiles'.
Anyother example of this is the biojava-ensembl code. Again,
this caches by raw contigs. I've quite happily worked with
human chromosomes (some 200-ish megabases) using this code
(and of course the human DAS server uses it).
Thomas.