[Biopython] GFF.writer

Peter Cock p.j.a.cock at googlemail.com
Mon May 6 08:29:05 UTC 2013


On Mon, May 6, 2013 at 6:02 AM, Mic <mictadlo at gmail.com> wrote:
> Hi Brad,
> Thank you it is working, but I have few questions by running the bellow
> code:
> from BCBio import GFF
> from Bio.Seq import Seq
> from Bio.SeqRecord import SeqRecord
> from Bio.SeqFeature import SeqFeature, FeatureLocation
>
> out_file = "your_file.gff"
> seq = Seq("GATCGATCGATCGATCGATC")
> rec = SeqRecord(seq, "ID1")
> qualifiers = {"source": "prediction", "note": "F5M15.26 n:1 Tax:Arabidopsis
> thaliana RepID:Q9LMV1_ARATH",
>               "ID": "gene1"}
> sub_qualifiers = {"source": "prediction"}
> top_feature = SeqFeature(FeatureLocation(0, 20), type="gene", strand=1,
>                          qualifiers=qualifiers)
> top_feature.sub_features = [SeqFeature(FeatureLocation(0, 5), type="exon",
> strand=1, score=12,
>                                        qualifiers=sub_qualifiers),
>                             SeqFeature(FeatureLocation(15, 20),
> type="exon", strand=1, score=-13,
>                                        qualifiers=sub_qualifiers)]
> rec.features = [top_feature]
>
> with open(out_file, "w") as out_handle:
>     GFF.write([rec], out_handle)
>
>
> * How is it possible to avoid to get e.g. *%20* and is there a way to get
> this order ID, note in below output?
> note=F5M15.26*%20*n*%3A*
> 1%20Tax%3AArabidopsis%20thaliana%20RepID%3AQ9LMV1_ARATH;ID=gene1
>
> * How is it possible to get score in sub_features, because the above code
> caused the following error?
> Traceback (most recent call
> last):
>
>   File "problem.py", line 15, in
> <module>
>
>
> qualifiers=sub_qualifiers),
>
> TypeError: __init__() got an unexpected keyword argument 'score'
>
> Thank you in advance
>
> Mic
>

Hi Mic,

Just to give you advance warning, sub-features are being deprecated
in the next release of Biopython. You'll still get them when parsing a
GenBank file etc, but they won't be used when writing the GenBank
file. Instead we have a new CompoundFeatureLocation instead.
One of the reasons for doing this is that historically sub-features
have been used for complex locations and NOT parent/child style
relationships as in GFF.

Brad - this would be a good thing for us to work on at the upcoming
CodeFest in Berlin: http://www.open-bio.org/wiki/Codefest_2013

Peter



More information about the Biopython mailing list