[Biopython-dev] GFF parser bug?

Eli Papa elipapa at mit.edu
Mon Apr 26 13:37:11 EDT 2010


Hi Brad,

Thanks for the quick reply! Hopefully, I'll be able to reciprocate in
the future..

The fix appears to work flawlessy so far, but I'll let you know if it
gives me other problems.
Unfortunately I have no control over the GFF (it was released to the
public as part of a published study).

It's unfortunately not clear from the methods section whether they
have employed Glimmer, MetaGene or some custom script to put the file
together. When I'll have some extra time, I'll certainly test which of
these programs is the culprit and let the author know about the
non-standard output format.

cheers,
eli

On Mon, Apr 26, 2010 at 12:56 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Eli;
>
>> While trying to use the GFF parser I ran into a value error.
>>
>> I think it's probably due to one of the GFF3 fields in my file not being
>> specified as 'key=value', but just as 'value'.
>
> Thanks for the report. Oh boy, that's a pretty bad file. In addition
> to the lack of a value you brought up, there is also a Parent/Child
> reference problem. The second line in the GFF you sent contains two
> issues:
>
> - A duplicate ID value for GL0000006. ID values are supposed to be
>  unique in a file.
> - The Parent=GL0000006 should be a reference to the initial
>  gene with that ID, but is also refers to itself.
>
>> scaffold4215_3  glimmer gene    3       62      .       -       . ID=GL0000006;Name=GL0000006;Lack 3'-end;
>> scaffold4215_3  glimmer mRNA    3       62      .       -       . ID=GL0000006;Name=GL0000006;Parent=GL0000006;Lack 3'-end;
>> scaffold4215_3  glimmer CDS     3       62      2.84    -       0 Parent=GL0000006;Lack 3'-end;
>> scaffold4215_3  glimmer gene    124     1983    .       -       . ID=GL0000007;Name=GL0000007;Complete;
>
> As Peter mentioned it would be useful to also file a bug with the
> writers of the software that are producing this. Bringing it in line
> with the spec will allow it to be more widely handled by other GFF
> parsers.
>
> You can get a fixed version of the GFF parser that gracefully
> handles these issues at:
>
> http://github.com/chapmanb/bcbb/tree/master/gff/
>
> or apply the changes to GFFParser directly:
>
> http://github.com/chapmanb/bcbb/commit/c530dc1b7d1d6b8b4df211849f969adf4df80a67
>
> Thanks much for the report. Let us know if you have any other
> issues,
> Brad
>



More information about the Biopython-dev mailing list