[Biopython-dev] GFF parser bug?
Brad Chapman
chapmanb at 50mail.com
Mon Apr 26 11:56:01 UTC 2010
Eli;
> While trying to use the GFF parser I ran into a value error.
>
> I think it's probably due to one of the GFF3 fields in my file not being
> specified as 'key=value', but just as 'value'.
Thanks for the report. Oh boy, that's a pretty bad file. In addition
to the lack of a value you brought up, there is also a Parent/Child
reference problem. The second line in the GFF you sent contains two
issues:
- A duplicate ID value for GL0000006. ID values are supposed to be
unique in a file.
- The Parent=GL0000006 should be a reference to the initial
gene with that ID, but is also refers to itself.
> scaffold4215_3 glimmer gene 3 62 . - . ID=GL0000006;Name=GL0000006;Lack 3'-end;
> scaffold4215_3 glimmer mRNA 3 62 . - . ID=GL0000006;Name=GL0000006;Parent=GL0000006;Lack 3'-end;
> scaffold4215_3 glimmer CDS 3 62 2.84 - 0 Parent=GL0000006;Lack 3'-end;
> scaffold4215_3 glimmer gene 124 1983 . - . ID=GL0000007;Name=GL0000007;Complete;
As Peter mentioned it would be useful to also file a bug with the
writers of the software that are producing this. Bringing it in line
with the spec will allow it to be more widely handled by other GFF
parsers.
You can get a fixed version of the GFF parser that gracefully
handles these issues at:
http://github.com/chapmanb/bcbb/tree/master/gff/
or apply the changes to GFFParser directly:
http://github.com/chapmanb/bcbb/commit/c530dc1b7d1d6b8b4df211849f969adf4df80a67
Thanks much for the report. Let us know if you have any other
issues,
Brad
More information about the Biopython-dev
mailing list