[Biopython] GFF3 manipulation
Mic
mictadlo at gmail.com
Sun Jun 30 07:46:49 UTC 2019
Hi all,
I have the below GFF3 file:
*##gff-version 3NbV1Ch01 NbGenome gene 98177 99675 .
- . ID=Nb3PK39646.path1;Name=Peroxidase 40NbV1Ch01
NbGenome mRNA 98177 99675 . - .
ID=Nb3PK39646.mrna1;Parent=Nb3PK39646.path1;Name=Nb3PK39646;coverage=99.8;identity=99.9;matches=1011;mismatches=1;indels=0;unknowns=0NbV1Ch01
NbGenome exon 98177 98571 100 - .
ID=Nb3PK39646.mrna1.exon4;Parent=Nb3PK39646.mrna1;Name=Nb3PK39646;Target=Nb3PK39646
620 1014 +NbV1Ch01 NbGenome CDS 98177 98571 100 -
2
ID=Nb3PK39646.mrna1.cds4;Parent=Nb3PK39646.mrna1;Name=Nb3PK39646;Target=Nb3PK39646
620 1014 +NbV1Ch01 NbGenome exon 98679 98844 100 -
.
ID=Nb3PK39646.mrna1.exon3;Parent=Nb3PK39646.mrna1;Name=Nb3PK39646;Target=Nb3PK39646
454 619 +NbV1Ch01 NbGenome CDS 98679 98844 100 -
0
ID=Nb3PK39646.mrna1.cds3;Parent=Nb3PK39646.mrna1;Name=Nb3PK39646;Target=Nb3PK39646
454 619 +NbV1Ch01 NbGenome exon 99134 99325 99 -
.
ID=Nb3PK39646.mrna1.exon2;Parent=Nb3PK39646.mrna1;Name=Nb3PK39646;Target=Nb3PK39646
262 453 +NbV1Ch01 NbGenome CDS 99134 99325 99 -
0
ID=Nb3PK39646.mrna1.cds2;Parent=Nb3PK39646.mrna1;Name=Nb3PK39646;Target=Nb3PK39646
262 453 +NbV1Ch01 NbGenome CDS 99417 99674 100 -
0
ID=Nb3PK39646.mrna1.cds1;Parent=Nb3PK39646.mrna1;Name=Nb3PK39646;Target=Nb3PK39646
4 261 +NbV1Ch01 NbGenome exon 99417 99675 100 -
.
ID=Nb3PK39646.mrna1.exon1;Parent=Nb3PK39646.mrna1;Name=Nb3PK39646;Target=Nb3PK39646
3 261 +###*NbV1Ch01 NbGenome gene 1115558 1121491 .
+ . ID=Nb3PK08375.path1;Name=Putative uncharacterized protein
At1g50120
NbV1Ch01 NbGenome mRNA 1115558 1121491 . + .
ID=Nb3PK08375.mrna1;Parent=Nb3PK08375.path1;Name=Nb3PK08375;coverage=100.0;identity=100.0;matches=399;mismatches=0;indels=0;unknowns=0
NbV1Ch01 NbGenome exon 1115558 1115879 100 + .
ID=Nb3PK08375.mrna1.exon1;Parent=Nb3PK08375.mrna1;Name=Nb3PK08375;Target=Nb3PK08375
1 322 +
NbV1Ch01 NbGenome CDS 1115558 1115879 100 + 0
ID=Nb3PK08375.mrna1.cds1;Parent=Nb3PK08375.mrna1;Name=Nb3PK08375;Target=Nb3PK08375
1 322 +
NbV1Ch01 NbGenome exon 1121415 1121491 100 + .
ID=Nb3PK08375.mrna1.exon2;Parent=Nb3PK08375.mrna1;Name=Nb3PK08375;Target=Nb3PK08375
323 399 +
NbV1Ch01 NbGenome CDS 1121415 1121491 100 + 2
ID=Nb3PK08375.mrna1.cds2;Parent=Nb3PK08375.mrna1;Name=Nb3PK08375;Target=Nb3PK08375
323 399 +
###
*...*
I would like to copy the "*Name=Peroxidase 40"* from gene feature and added
into mRNA feature as "*Note=Peroxidase 40*". Unfortunately, the below
script caused an error:
*import pprintfrom BCBio import GFFin_file =
"/Users/lorencm/tmp/Gmp_NbV1_Final.gff3"out_file =
"/Users/lorencm/tmp/Gmp_NbV1_Final.gff3.bak"in_handle = open(in_file)with
open(out_file, "w") as out_handle: for rec in GFF.parse(in_handle):
for feature in rec.features: print(feature)
print(feature.qualifiers.get("Name"))
print(feature.sub_features)
print(feature.sub_features[0].qualifiers.get("Name"))
print("!!!!change") feature.sub_features[0].qualifiers["Note"] =
feature.qualifiers.get("Name")
pprint.pprint(feature.sub_features[0].qualifiers.get("Note"))
print("!!!DONE") out_file.write([rec])in_handle.close()*
Here is output and error:
*python /projects/test.pytype: genelocation: [98176:99675](-)id:
Nb3PK39646.path1qualifiers: Key: ID, Value: ['Nb3PK39646.path1'] Key:
Name, Value: ['Peroxidase 40'] Key: source, Value:
['NbGenome']['Peroxidase
40'][SeqFeature(FeatureLocation(ExactPosition(98176), ExactPosition(99675),
strand=-1), type='mRNA',
id='Nb3PK39646.mrna1')]['Nb3PK39646']!!!!change['Peroxidase
40']!!!DONETraceback (most recent call last): File "test.py", line 19, in
<module> out_file.write([rec])AttributeError: 'str' object has no
attribute 'write'*
What did I miss?
Thank you in advance,
Mic
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20190630/cd2b5004/attachment.htm>
More information about the Biopython
mailing list