[BioRuby] blast -m7 (xml) and multiple queries
Ben Woodcroft
donttrustben at gmail.com
Sat Jun 28 04:26:14 UTC 2008
Hi,
I seem to have run across a bug in the bioruby blast report parser, in
that it isn't able to handle reports that span multiple query
sequences. My code for parsing is
Bio::Blast.reports(ARGF) do |report|
puts "Hits for " + report.query_def + " against " + report.db
report.each {|hit|
hit.each do |hsp|
puts [
report.query_def,
hit.accession,
hsp.query_from,
hsp.query_to,
hsp.hit_from,
hsp.hit_to,
hsp.evalue,
hit.target_def
].join("\t")
end
}
When I run this on a blast xml output with 2 queries (1st has 10 hits
and 2nd has 7), I get 8 hits shown, which is somewhat confusing. The
query sequences are somewhat similar, so they have some hits in common
- perhaps this sort of explains the number 8.
I'm using bioruby from git
http://github.com/bioruby/bioruby/commit/a61b16163d3ca74f3f7c8d8e8f03f5f8c68dee60
Using the newest blast (2.2.18).
Is this easy to fix? Is there a workaround?
A partial answer:
According to http://rubyforge.org//tracker/index.php?func=detail&aid=20272&group_id=769&atid=3037
this is an unopened, unfixed bug, caused by a change in the NCBI XML
schema. I can workaround by reblasting with the legacy flag -V.
Thanks in advance,
ben
More information about the BioRuby
mailing list