[Bioperl-l] bp_search2gff.pl

Eric Just e-just at northwestern.edu
Fri Oct 5 19:35:25 UTC 2007


Hello,

I have been playing with the bp_search2gff.pl script (on HEAD of
bioperl-live).    There are a couple of issues I was wondering about.

One is the ID that gets generated for a match feature when the --match
option is set.   The ID is  set to the ID of the query sequence.  This
can be problematic if you are representing the query sequence and the
blast hit in the same gff file.  When using the resultant gff file for
loading into Chado, it also creates a problem if you have more than
one hit for a given query sequence, for example if you ran two
different analyses that each had a hit for a given query.  Would it be
possible to have an option to create a unique ID for match features.
One suggestion could be to create an ID based on the ID of the query +
the id of the hit + the source

As long as two different analyses were loaded as different sources,
this would ensure unique IDs for the match features.


Also, is there a reason for writing the Target string as

Target=Sequence:SOME_ID

as opposed to

Target=SOME_ID


The latter seems a little more in line with the gff3 spec and plays a
little nicer with the GMOD tools.

Thanks for looking into this.

Eric



More information about the Bioperl-l mailing list