[Bioperl-l] Bio::Seq, search for specific features

Jun Yin jun.yin at ucd.ie
Thu Sep 9 08:20:39 UTC 2010


Hi,

I would like to give a go on the bin indexing scheme on Bio::Seq(or a
similar package to Bio::LocatableSeq). The idea is to save the index of
sequences to a local database (AnyDBM) instead of the memory itself. So this
will free some memory usage. This idea actually comes from Bio::DB::Fasta,
as implemented by Lincoln Stein.

Cheers,
Jun Yin
Ph.D. student in U.C.D.

Bioinformatics Laboratory
Conway Institute
University College Dublin


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
Sent: Thursday, September 09, 2010 12:20 AM
To: Frank Schwach
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio::Seq, search for specific features

Well, no move has been concretely made yet.  It would be nice to abstract
the backend, so one could use possibly any db or memory adaptor.  This is
essentially the direction I would like to take the alignment data as well
(part of the GSoC project for BioPerl this year was to tackle this very
thing).

chris

On Sep 8, 2010, at 3:42 AM, Frank Schwach wrote:

> Hi Jason,
> 
> Yes, I guess that would be the simplest way of doing it - basically just
> doing it the way the docs suggest for getting at a specific feature but
> hiding the grep behind a Bio::Seq method with search parameters. But we
> could also build a hash of feature tags as the Bio::Seq is built so that
> retrieval is more efficient. This could also be used to implement a bin
> indexing scheme for range queries, similar to what Bio::DB::GFF does.
> Is a move to an sqlite backend planend for the near future? 
> 
> Frank
> 
> 
> 
> On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote:
>> And the implementation would just be something like this?
>> 
>> my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] 
>> eq 'my_gene' } $seq->get_SeqFeatures();
>> 
>> I think any implementation would be if we moved from the in-memory 
>> arrays & hash-based system to a sqlite db on the back-end for how 
>> Sequence and Feature objects are stored.
>> This would be a somewhat slower but wouldn't have performance/memory 
>> problems we get for sequences with many annotations.
>> 
>> -jason
>> Frank Schwach wrote, On 9/7/10 5:09 AM:
>>> I am working a lot with feature-rich Bio::Seq objects these days and
>>> thought that it would be really nice if I could do something like:
>>> 
>>> my @features = $bio_seq_obj->get_SeqFeatures(-by_id =>  'my_gene');
>>> 
>>> instead of having to grep for the feature every time.
>>> There could then be 'by_tag' and 'by_region' options as well.
>>> 
>>> According to the Bio::Seq docs, something like this seems to be planned
>>> at some stage. I would be willing to contribute to this feature if I can
>>> and if this isn't already being implemented by somebody else.
>>> Does anybody know the state of this feature?
>>> 
>>> Frank
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
> 
> 
> 
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome Research 
> Limited, a charity registered in England with number 1021457 and a 
> company registered in England with number 2742969, whose registered 
> office is 215 Euston Road, London, NW1 2BE. 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l

__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com


 

__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com
 





More information about the Bioperl-l mailing list