[BioRuby] GFF3
Pjotr Prins
pjotr.public14 at thebird.nl
Tue Aug 17 16:38:37 UTC 2010
On Mon, Aug 16, 2010 at 10:52:02PM +0900, Tomoaki NISHIYAMA wrote:
> Hi,
>
>> A simple implementation would be to store all relations into a
>> graph (or graphs) and then extracting information.
>
> I recently wrote a program to extract all the mRNAs, but up to the
> addresses
> and not to the sequences.
>
> http://github.com/tomoakin/Bioruby-use/blob/master/src/gff2easytrack.rb
>
> This is not designed to be very general, but might be useful as a
> starting point.
Thanks for the nice example. It shows how you can filter GFF without
storing everything in memory. Naturally that does not work for
extracting all transcripts as GFF does not guarantee ordered data.
Still, a good example. What I also like is that there is almost no
coupling with other BioRuby modules (other than embedded Fasta). We
should keep it that way.
Question, have we ever seen GFF files that are not ordered? It makes
so much sense to keep genes and their components together. I think it
is somewhere argued that you can share parts between genes, but how
often does that happen - and would they be far apart in the file?
Even Lincoln states that you can split GFF files. That would not work
if data is not together.
I am thinking we can assume that related data comes with each other.
This means we only have to cache a limited number records in memory
to resolve dependencies.
I'll probably write something in the coming week, as I need it. I'll
design it to be a BioRuby plugin. For the time being.
Pj.
More information about the BioRuby
mailing list