[Biopython-dev] [Bug 2762] GFF capability in SeqIO
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Thu Feb 19 08:49:40 UTC 2009
http://bugzilla.open-bio.org/show_bug.cgi?id=2762
------- Comment #4 from lpritc at scri.sari.ac.uk 2009-02-19 03:49 EST -------
(In reply to comment #1)
> This looks like a nice idea, and possible with some simplifications to match
> our existing object scheme. For example, the current SeqRecord and SeqFeature
> classes do not let us explicitly define parent (part-of) relationships between
> SeqFeature objects (e.g. GFF3 examples where a CDS has a parent mRNA, or an
> exon may have multiple parent mRNAs). We do have the idea of sub-features, but
> this only allows a single parent and thus won't work here. This parent
> information could be recorded as just another SeqFeature qualifier dictionary
> entry.
I'm not sure that these relationships would need to complicate the SeqFeature
class model at all, and agree that the attribute tags indicating Parenthood (in
the sense of CDS having parent mRNA, as opposed to the SeqRecord/SeqFeature
parent-child relationship) could potentially be treated just as
SeqFeature.qualifiers attributes. The possibility of multiple parents (in
general, membership of more than one group) in GFF3 lends itself well to the
existing list representation of qualifiers.
I may be wrong but I think that at least some, if not all, of the relationships
you might be worried about (for example, those in your linked post to the
BioSQL list) are well-defined within the SOFA ontology. So, for example, a
BioSQL database with properly-configured SOFA ontology, and properly-defined
relationships, could be used to infer those parent-child relationships on the
basis of the corresponding term_ids. I don't think that's a behaviour we need
to expect from the SeqRecord/SeqFeature class models. Where possible, those
relationships could be rebuilt by another function, or package, so long as the
SeqFeature object correctly records those descriptions as SOFA terms in the
qualifier (or implicitly uses the SOFA ontology when depositing in a database -
but that's another enhancement request ;)), I'm not sure that this needs to
complicate the SeqFeature class model either.
(That said, maybe somewhere down the line there's a role for SQLite in handling
that sort of behaviour 'on-the-fly'...)
I may have misunderstood, but I think that this is still the same sort of
general arrangement that is already the case for GenBank file. When loading,
say, a bacterial chromosome, SeqRecord.seq gets the chromosome sequence, and
the gene, CDS, and various misc_features for a single gene are imported as -
essentially - independent features. We can unite them, after the fact, the by
gene name, or locus_tag, or some other attribute, which is essentially the same
kind of operation as uniting a CDS with its parent gene via the SOFA ontology
and the Parent tag for upload into a SOFA-compliant instance of BioSQL.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list