[Biopython] GFF3 manipulation

Mic mictadlo at gmail.com
Sun Jun 30 07:46:49 UTC 2019


Hi all,
I have the below GFF3 file:













*##gff-version 3NbV1Ch01        NbGenome        gene    98177   99675   .
    -       .       ID=Nb3PK39646.path1;Name=Peroxidase 40NbV1Ch01
 NbGenome        mRNA    98177   99675   .       -       .
ID=Nb3PK39646.mrna1;Parent=Nb3PK39646.path1;Name=Nb3PK39646;coverage=99.8;identity=99.9;matches=1011;mismatches=1;indels=0;unknowns=0NbV1Ch01
       NbGenome        exon    98177   98571   100     -       .
ID=Nb3PK39646.mrna1.exon4;Parent=Nb3PK39646.mrna1;Name=Nb3PK39646;Target=Nb3PK39646
620 1014 +NbV1Ch01        NbGenome        CDS     98177   98571   100     -
      2
ID=Nb3PK39646.mrna1.cds4;Parent=Nb3PK39646.mrna1;Name=Nb3PK39646;Target=Nb3PK39646
620 1014 +NbV1Ch01        NbGenome        exon    98679   98844   100     -
      .
ID=Nb3PK39646.mrna1.exon3;Parent=Nb3PK39646.mrna1;Name=Nb3PK39646;Target=Nb3PK39646
454 619 +NbV1Ch01        NbGenome        CDS     98679   98844   100     -
      0
ID=Nb3PK39646.mrna1.cds3;Parent=Nb3PK39646.mrna1;Name=Nb3PK39646;Target=Nb3PK39646
454 619 +NbV1Ch01        NbGenome        exon    99134   99325   99      -
      .
ID=Nb3PK39646.mrna1.exon2;Parent=Nb3PK39646.mrna1;Name=Nb3PK39646;Target=Nb3PK39646
262 453 +NbV1Ch01        NbGenome        CDS     99134   99325   99      -
      0
ID=Nb3PK39646.mrna1.cds2;Parent=Nb3PK39646.mrna1;Name=Nb3PK39646;Target=Nb3PK39646
262 453 +NbV1Ch01        NbGenome        CDS     99417   99674   100     -
      0
ID=Nb3PK39646.mrna1.cds1;Parent=Nb3PK39646.mrna1;Name=Nb3PK39646;Target=Nb3PK39646
4 261 +NbV1Ch01        NbGenome        exon    99417   99675   100     -
    .
ID=Nb3PK39646.mrna1.exon1;Parent=Nb3PK39646.mrna1;Name=Nb3PK39646;Target=Nb3PK39646
3 261 +###*NbV1Ch01        NbGenome        gene    1115558 1121491 .
+       .       ID=Nb3PK08375.path1;Name=Putative uncharacterized protein
At1g50120
NbV1Ch01        NbGenome        mRNA    1115558 1121491 .       +       .

ID=Nb3PK08375.mrna1;Parent=Nb3PK08375.path1;Name=Nb3PK08375;coverage=100.0;identity=100.0;matches=399;mismatches=0;indels=0;unknowns=0
NbV1Ch01        NbGenome        exon    1115558 1115879 100     +       .

ID=Nb3PK08375.mrna1.exon1;Parent=Nb3PK08375.mrna1;Name=Nb3PK08375;Target=Nb3PK08375
1 322 +
NbV1Ch01        NbGenome        CDS     1115558 1115879 100     +       0

ID=Nb3PK08375.mrna1.cds1;Parent=Nb3PK08375.mrna1;Name=Nb3PK08375;Target=Nb3PK08375
1 322 +
NbV1Ch01        NbGenome        exon    1121415 1121491 100     +       .

ID=Nb3PK08375.mrna1.exon2;Parent=Nb3PK08375.mrna1;Name=Nb3PK08375;Target=Nb3PK08375
323 399 +
NbV1Ch01        NbGenome        CDS     1121415 1121491 100     +       2

ID=Nb3PK08375.mrna1.cds2;Parent=Nb3PK08375.mrna1;Name=Nb3PK08375;Target=Nb3PK08375
323 399 +
###
*...*

I would like to copy the "*Name=Peroxidase 40"* from gene feature and added
into mRNA feature as "*Note=Peroxidase 40*". Unfortunately, the below
script caused an error:




















*import pprintfrom BCBio import GFFin_file =
"/Users/lorencm/tmp/Gmp_NbV1_Final.gff3"out_file =
"/Users/lorencm/tmp/Gmp_NbV1_Final.gff3.bak"in_handle = open(in_file)with
open(out_file, "w") as out_handle:    for rec in GFF.parse(in_handle):
  for feature in rec.features:            print(feature)
print(feature.qualifiers.get("Name"))
print(feature.sub_features)
print(feature.sub_features[0].qualifiers.get("Name"))
print("!!!!change")            feature.sub_features[0].qualifiers["Note"] =
feature.qualifiers.get("Name")
pprint.pprint(feature.sub_features[0].qualifiers.get("Note"))
print("!!!DONE")            out_file.write([rec])in_handle.close()*

Here is output and error:



















*python /projects/test.pytype: genelocation: [98176:99675](-)id:
Nb3PK39646.path1qualifiers:    Key: ID, Value: ['Nb3PK39646.path1']    Key:
Name, Value: ['Peroxidase 40']    Key: source, Value:
['NbGenome']['Peroxidase
40'][SeqFeature(FeatureLocation(ExactPosition(98176), ExactPosition(99675),
strand=-1), type='mRNA',
id='Nb3PK39646.mrna1')]['Nb3PK39646']!!!!change['Peroxidase
40']!!!DONETraceback (most recent call last):  File "test.py", line 19, in
<module>    out_file.write([rec])AttributeError: 'str' object has no
attribute 'write'*

What did I miss?

Thank you in advance,

Mic
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20190630/cd2b5004/attachment.htm>


More information about the Biopython mailing list