[BioRuby] Bio::Blat::Report

Fri Sep 5 10:47:06 UTC 2008

On Sep 5, 2008, at 11:25 AM, Tomoaki NISHIYAMA wrote:

> Hi,
>
>> ehm... any good translator from japanese to english (or better  
>> italian!) ?  :P
>
> Here is a translation by the original sender:
>

dear Nishiyama

thanks for translation

to follow the discussion: I am agree, the splitter work well and is  
fast (create an hash can be a problem with big files).

I am grouping queries in my script (in bioruby 1.2.1, not the last  
git release) with group_by and query.name that return an Hash as you  
say.

Also for my sorting operation (sorting by score, coverage, identity,  
etc...) is better to work in a small array with only the hits related  
to one query.

Soon I will put somewhere the code for my blatanalyzer.... (ruby  
version), any suggestion on where to put it?

thanks for the kindly translation

Davide

> -- start of translation
> I am Nishiyama at Kanazawa.
>
> When a multifasta file is used as queries, unlike blast,
> blat does not output a header, but instead
> outputs the query and target id in each line.
>
> Bio::Blat::Report, in accordance with that
> behavior, seems to return one entry with many
> hits.  However, as a user, searching with a split file for each query
> is undesired, while the results is desired to be aggregated for
> each query.
> For example when you want the best hit location for each query.
>
> Although, there is no separator in the output of blat, the result
> for the same query comes continuously.
> When processing as a FlatFile, it would be useful
> to return a block with the same query name as an "entry",
> I made "flatfile_splitter".
> Because each line is parsed for determination of split positioin,
> return value were made as an Array of Hit, so that Hit.new
> need not be called again.  (For the speed this would about 20%  
> difference.)
>
> When processing a psl file of 100-200 Mbytes, more than several  
> Gbytes of
> memory were required with a system reading the whole data into
> a Hash and processing the hits for each query,
> but with this system much smaller memory is sufficient.
>
> What do you think?
>
> -- end of translation
>
> The remainder are the diff of the source code.
> Note that the name of class and file are changed to avoid collision  
> and the
> behavior of the original class is not changed.
>
> On 2008/09/04, at 18:11, Davide Rambaldi wrote:
>
>>
>> On Sep 4, 2008, at 5:52 AM, Naohisa GOTO wrote:
>>
>>> This is somehow incompatible, but good at speed and memory usage.
>>> In addition, some people requested.
>>> http://lists.open-bio.org/pipermail/bioruby-ja/2007-August/ 
>>> 000137.html
>>> (Mailing list written in Japanese)
>>
>>
>> ehm... any good translator from japanese to english (or better  
>> italian!) ?  :P
>>
>> anyway I am agree that the strange case of mixed hits can be ignored.
>>
>> This commits will be available in the next version of bioruby?
>>
>> I have bioruby on the edge in my laptop but not on the cluster...
>>
>> Last question (sorry for asking everything), there is a way to  
>> generate docs of boiruby that can be queried with the ri command?
>>
>> ri Bio::Blat::Report
>> Nothing known about Bio::Blat::Report
>>
>>
>> Thanks!
>>
>> Davide Rambaldi,
>> Bioinformatics PhD student.
>> -----------------------------------------------------
>> Bioinformatic Group IFOM-IEO Campus
>> Via Adamello 16, Milano
>> I-20139 Italy
>>
>> [t] +39 02574303 066
>> [e] davide.rambaldi at ifom-ieo-campus.it
>> [i] http://ciccarelli.group.ifom-ieo-campus.it/fcwiki/ 
>> DavideRambaldi (homepage)
>> [i] http://www.semm.it             (PhD school)
>> [i] http://www.btbs.unimib.it/     (Master)
>>
>> -----------------------------------------------------
>>
>>
>>
>>
>> _______________________________________________
>> BioRuby mailing list
>> BioRuby at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioruby
>
>
>
> -- 
> Tomoaki NISHIYAMA
>
> Advanced Science Research Center,
> Kanazawa University,
> 13-1 Takara-machi,
> Kanazawa, 920-0934, Japan
>
>

Davide Rambaldi,
Bioinformatics PhD student.
-----------------------------------------------------
Bioinformatic Group IFOM-IEO Campus
Via Adamello 16, Milano
I-20139 Italy

[t] +39 02574303 066
[e] davide.rambaldi at ifom-ieo-campus.it
[i] http://ciccarelli.group.ifom-ieo-campus.it/fcwiki/DavideRambaldi  
(homepage)
[i] http://www.semm.it             (PhD school)
[i] http://www.btbs.unimib.it/     (Master)

-----------------------------------------------------