[Biopython] GFF.writer

Brad Chapman chapmanb at 50mail.com
Mon May 6 11:03:20 UTC 2013


Mic;

> Thank you it is working, but I have few questions by running the bellow
> code:
[...]
> * How is it possible to avoid to get e.g. *%20* and is there a way to get
> this order ID, note in below output?
> note=F5M15.26*%20*n*%3A*
> 1%20Tax%3AArabidopsis%20thaliana%20RepID%3AQ9LMV1_ARATH;ID=gene1

Apologies, I am escaping too much according to the GFF specification. I
checked in a fix to avoid escaping spaces and semi-colons. If you get
the latest version from GitHub it will avoid this issue.

I also checked in an update to order the key/value attributes in
alphabetical order. There isn't a defined ordering of these in the spec
but I agree that a consistent one would be nice.

Thanks for all the useful feedback.

> * How is it possible to get score in sub_features, because the above code
> caused the following error?

You want to specify the score as part of the SeqFeature qualifiers.
Your fixed code is:

from BCBio import GFF
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.SeqFeature import SeqFeature, FeatureLocation

out_file = "your_file.gff"
seq = Seq("GATCGATCGATCGATCGATC")
rec = SeqRecord(seq, "ID1")
qualifiers = {"source": "prediction", "note": "F5M15.26 n:1 Tax:Arabidopsis thaliana RepID:Q9LMV1_ARATH",
              "ID": "gene1"}
top_feature = SeqFeature(FeatureLocation(0, 20), type="gene", strand=1,
                         qualifiers=qualifiers)
top_feature.sub_features = [SeqFeature(FeatureLocation(0, 5), type="exon", strand=1,
                                       qualifiers={"source": "prediction",
                                                   "score": 12}),
                            SeqFeature(FeatureLocation(15, 20), type="exon", strand=1,
                                       qualifiers={"source": "prediction",
                                                   "score": -13})]
rec.features = [top_feature]

with open(out_file, "w") as out_handle:
    GFF.write([rec], out_handle)

Peter:
> Just to give you advance warning, sub-features are being deprecated
> in the next release of Biopython. You'll still get them when parsing a
> GenBank file etc, but they won't be used when writing the GenBank
> file. Instead we have a new CompoundFeatureLocation instead.
> One of the reasons for doing this is that historically sub-features
> have been used for complex locations and NOT parent/child style
> relationships as in GFF.
>
> Brad - this would be a good thing for us to work on at the upcoming
> CodeFest in Berlin: http://www.open-bio.org/wiki/Codefest_2013

Agreed, I need to get up to date with this on the latest release. I'm
also going to spend some time and merge most of the functionality into
Ryan's gffutils library so it can import and export Biopython objects:

https://github.com/daler/gffutils/tree/refactor

Brad



More information about the Biopython mailing list