[BioSQL-l] Importing GFF3 files into a BioSQL database?

Brad Chapman chapmanb at 50mail.com
Sun Feb 22 16:18:06 EST 2009


Hi Peter;

> Do any of the Bio* projects currently let you import a GFF3 (or even a
> GFF2) file into a BioSQL database?

Normalizing the GFF and standard SeqIO representations is a great
idea. I use BioSQL quite a bit, and it would be nice to be able to
output GFF formatted files directly from bioentries. 

To get more familiar with BioPerl GFF mappings, I took a look at how
GenBank files get converted to GFF files with BioPerl. Generally,
most things map as you'd expect but a few items are left behind. I
wrote up the details on the current mappings, along with some
proposals for expanding them, here:

http://bcbio.wordpress.com/2009/02/22/exploring-bioperl-genbank-to-gff-mapping/

I think for the Biopython mapping we could try and follow what
BioPerl does where it makes good sense, and introduce the other
items in a way that is consistent and could be followed by other
projects.

Hope this helps move things forward,
Brad



> 
> Looking at some of the examples on
> http://www.sequenceontology.org/gff3.shtml this looks possible.  I
> assume each GFF file normally describes features on a single
> plasmid/chromosome, meaning a single bioentry table entry.  I would
> expect each GFF feature to become a seqfeature table entry (with a
> location table entry for each line describing its location), and the
> main sequence (if present in the GFF file), would be a biosequence
> table entry.  So far this isn't too complicated.  The GFF3
> documentation gives some example of "parent" or rather "part-of"
> relationships between features (e.g. an exon which is part of three
> parent mRNA features).  Perhaps three entries in the
> seqfeature_relationship table could record this.
> 
> Also, GFF3 files seem to be very organised with regards ontologies -
> something we have touched on before on this mailing list.
> 
> My reason for asking regards adding GFF parsing to Biopython.
> Biopython has a parsing framework (Bio.SeqIO) which turns various file
> formats (e.g. GenBank) into objects (SeqRecord objects, with optional
> SeqFeature objects), which we can map onto the BioSQL tables.  If we
> manage to integrate GFF parsing into Biopython's Bio.SeqIO framework
> (non-trivial), then Biopython would as a consequence be able to load a
> GFF file into BioSQL.  If any of the other Bio* projects can already
> import GFF files into BioSQL, I'd like Biopython to load the data into
> the database in the same way.  Essentially this would give a recipe
> for how we should model the GFF data in our objects in order to
> achieve this intra-project BioSQL compatibility.
> 
> Peter
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l


More information about the BioSQL-l mailing list