[BioRuby] Bio::Blat::Report
Davide Rambaldi
davide.rambaldi at ifom-ieo-campus.it
Wed Sep 3 15:48:07 UTC 2008
Hi again sorry for all this e-mails,
I notice a change in the reporter object (add_line method) after commit:
http://github.com/bioruby/bioruby/commit/
88b2fb24dddcd2d5d0715e8274eda1b1ebac0abd
+ # Adds a line to the entry if the given line is regarded as
+ # a part of the current entry.
+ # If the current entry (self) is empty, or the line has the same
+ # query name, the line is added and returns self.
+ # Otherwise, returns false (the line is not added).
+ def add_line(line)
+ if /\A\s*\z/ =~ line then
+ return @hits.empty? ? self : false
+ end
+ hit = Hit.new(line.chomp)
+ if @hits.empty? or @hits.first.query.name == hit.query.name
then
+ @hits.push hit
+ return self
+ else
+ return false
+ end
end
So now if there are more than one query_id in the input file it will
be automatically splitted in different reports right?
That's cool (I have developed a method in my blat analyzer to group
hits by id that I can remove now).
the only point I see: what append with an input with line swapped?
I don't believe is a common case anyway: blat psl results are ordered
by query name
but can happend if you change the order of psl lines.
consider this script:
#!/usr/local/bin/ruby -w
require 'bio'
Bio::FlatFile.open(Bio::Blat::Report,ARGF).each do |report|
puts "object id: " + report.object_id.to_s + " hits: " +
report.hits.size.to_s + " query name:" + report.query_id
end
Before the commit it give only one object, and (as reported in doc)
only the first query name.
now with this test file:
-------------- next part --------------
3 lines of psl output with 3 different query name:
output:
object id: 277400 hits: 1 query name:query1
object id: 274620 hits: 1 query name:query2
object id: 271910 hits: 1 query name:query3
But if with a psl file like this one:
-------------- next part --------------
Where we have 3 query names (2 hits each) and lines are not in order:
object id: 277400 hits: 1 query name:query1
object id: 274620 hits: 1 query name:query2
object id: 272010 hits: 1 query name:query1
object id: 269350 hits: 1 query name:query3
object id: 266640 hits: 1 query name:query2
object id: 263930 hits: 1 query name:query3
f I sort the lines again by query name:
-------------- next part --------------
object id: 277400 hits: 2 query name:query1
object id: 273590 hits: 2 query name:query2
object id: 269800 hits: 2 query name:query3
So it doesn't work if you have unsorted lines (but I guess is faster).
Sorry for my bad english and for this long mail.
best regards
Davide Rambaldi,
Bioinformatics PhD student.
-----------------------------------------------------
Bioinformatic Group IFOM-IEO Campus
Via Adamello 16, Milano
I-20139 Italy
[t] +39 02574303 066
[e] davide.rambaldi at ifom-ieo-campus.it
[i] http://ciccarelli.group.ifom-ieo-campus.it/fcwiki/DavideRambaldi
(homepage)
[i] http://www.semm.it (PhD school)
[i] http://www.btbs.unimib.it/ (Master)
-----------------------------------------------------
More information about the BioRuby
mailing list