[Bioperl-l] truncating a sequence and remapping annotations
Jason Stajich
jason at bioperl.org
Thu Aug 27 14:41:28 EDT 2009
Yeah one thought that we batted around at a hackathon many moons ago
had been to use Bio::DB::SeqFeature in a lightweight way under the
hood to represent sequences in layers more rather than the arbitrary
data model that is setup by focusing on handling GenBank records. A
lot of the architecture development (that is like 10-15 years old
now!) was initially just focused on round-tripping the sequence files.
We more recently felt like a new model was more appropriate. With the
fast SQLite implementation that Lincoln has put in for DB::SeqFeature
we could in theory map every sequence into a SQLite DB and then have
the power of the interface.
Some more bells and whistles might be needed but the basic API is
respected AFAIK and it prevents needing to store whole sequences in
memory. The SeqIO->DB::SeqFeature loading would need some finessing
so that as parsed the sequence object could be updated efficiently.
Actually this might also help reduce the number of objects needed to
be created by basically efficiently serializing sequences into the DB
on parsing (and with some simple caching this could make for pretty
fast system). Since disk is basically not a limitation now could be
an interesting experiment? Maybe it is too out there, but if not it
could be something major enough that it has to go in a bioperl-2/
bioperl-ng. It sort of assumes the data model of Bio::DB::SeqFeature
is adequate for all the messiness of sequence data formats and one
problem for some people has been the seq file format => GFF in order
to load it into a SeqFeature DB for Gbrowse... So I don't know what
are the boundary cases here. Certainly for FASTA it should be
straightforward.
-jason
On Aug 27, 2009, at 11:20 AM, Chris Fields wrote:
> It's not implemented completely. As Jason mentioned in the bug
> report, it was meant to be part of an overall system to truncate
> sequences with remapped features, but the implementation in place is
> substandard. It's open for implementation if anyone wants to take
> it up.
>
> I should point out, though, in my opinion Bio::DB::GFF/SeqFeature
> deal with this in a more elegant and lightweight way, and is
> probably the direction I would take. YMMV.
>
> chris
>
> On Aug 27, 2009, at 12:40 PM, Robert Buels wrote:
>
>> Looks like bug 1572 is related to this: http://bugzilla.open-bio.org/show_bug.cgi?id=1572
>>
>> Rob
>>
>> Robert Buels wrote:
>>> Hi all,
>>> Recently a user came into #bioperl looking to truncate an
>>> annotated sequence (leaving the region between e.g. 150 to 250
>>> nt), and have the annotations from the original sequence be
>>> remapped onto the new truncated sequence.
>>> Poking through code, I came across an undocumented function
>>> trunc() that from the comments looks like it was written by Jason
>>> as part of a master plan to implement this very functionality.
>>> Just wondering, what's the status of that?
>>> Rob
>>
>>
>> --
>> Robert Buels
>> Bioinformatics Analyst, Sol Genomics Network
>> Boyce Thompson Institute for Plant Research
>> Tower Rd
>> Ithaca, NY 14853
>> Tel: 503-889-8539
>> rmb32 at cornell.edu
>> http://www.sgn.cornell.edu
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
More information about the Bioperl-l
mailing list