[Bioperl-l] experimental Bio::Search::Tiling implementation

Mark A. Jensen maj at fortinbras.us
Tue May 19 15:31:48 EDT 2009


Hi All-

With the frequent posts concerning HSP tilings, I thought it was time
to create the sought-after Bio::Search::Tiling namespace, and attempt
to provide a robust and exact tiling algorithm. I think it's timely,
too, since Jason's usual remarks involve the use of wu-blast with 
the --links option, and wu-blast has recently turned commercial and
is evidently costly to obtain.

The namespace includes an abstract interface B:S:Tiling::TilingI, and
a concrete class called B:S:Tiling::MapTiling. The object is
constructed like so 

 $tiling = Bio::Search::Tiling::MapTiling($your_blast_hit);

and provides methods for identities(), conserved(), and length();
other stats could also be provided. Identities and conserved sites are
correctly estimated, accounting for multiple overlapping HSPs. There
is also a method next_tiling($type), where $type is 'hit', 'subject'
(alias for 'hit'), or 'query', which an iterator stepping through all
minimal sets of HSPs that completely cover the 'hit' or 'query'
sequence. One feature is that the individual tilings do not need to be
generated to estimate the statistics; next_tiling provides the individual
tilings only if you want/need them. 

I've made it available in a pre-alpha state on bioperl-dev. It's
working and workable with plenty pod: see the synopses. It would
be excellent if interested folks would try it out on their favorite
data. Some niceties are not yet implemented, so BLASTP data is your
best bet for success. Check it out via svn into a separate working
directory, let me know if there are any questions.

Below is table of comparison numbers using the current SearchUtils
tiling implementation and some of the new methods, on some test data
in t/data. Please see pod for many more details. 

Cheers, 
Mark


*****
Comparision of methods with (patched) Bio::Search::SearchUtils
using test data t/data/dcr1_sp.WUBLASTP

SU: SearchUtils
MT: MapTiling, using methods 'exact', 'est', 'max'
so MT(q:x) is MapTiling, stats calculated on the query, with the exact
method, etc.

Identities
Hit   SU MT(q:x) MT(q:e) MT(q:m) MT(s:x) MT(s:e) MT(s:m)
sp|P34529.2|DCR1_CAEEL 1845.00 1845.00 1845.00 1845.00 1845.00 1845.00 1845.00
sp|Q9VCU9.1|DCR1_DROME 668.00 664.50 666.26 668.00 678.00 678.00 678.00
sp|Q9UPY3.2|DICER_HUMAN 706.00 706.00 706.00 706.00 706.00 706.00 706.00
sp|Q8R418.2|DICER_MOUSE 698.00 698.00 698.00 698.00 698.00 698.00 698.00
sp|P84634.1|DCL4_ARATH 341.00 341.00 341.00 341.00 341.00 341.00 341.00
sp|Q7S8J7.1|DCL1_NEUCR 403.00 402.00 403.47 403.00 374.17 371.65 379.00
sp|A4RKC3.2|DCL1_MAGGR 331.00 333.00 333.54 337.00 347.50 348.51 348.00
sp|Q0CW42.2|DCL1_ASPTN 387.00 387.50 388.07 388.00 381.00 382.67 389.00
sp|Q1DKI1.2|DCL1_COCIM 331.00 335.00 334.44 339.00 337.50 336.98 341.00
sp|A2RAF3.2|DCL1_ASPNC 282.00 277.50 279.39 282.00 289.00 289.00 289.00
sp|Q09884.1|DCR1_SCHPO 314.00 314.00 314.60 316.00 319.33 318.59 328.00
sp|A1CBC9.2|DCL1_ASPCL 343.00 338.50 341.51 343.00 333.50 332.13 340.00
sp|A1DE13.1|DCL1_NEOFI 339.00 342.50 343.06 346.00 348.00 348.45 351.00
sp|Q2VF19.1|DCL1_CRYPA 284.00 284.00 284.78 284.00 288.00 289.00 290.00
sp|Q0UI93.2|DCL1_PHANO 366.00 366.00 366.00 366.00 355.00 356.36 356.00
sp|Q4WVE3.3|DCL1_ASPFU 313.00 310.00 314.71 315.00 328.00 328.45 331.00
sp|Q2U6C4.2|DCL1_ASPOR 325.00 325.00 325.00 325.00 325.00 325.00 325.00
sp|Q2UNX5.1|DCL2_ASPOR 282.00 282.50 282.74 283.00 275.50 275.86 276.00
sp|Q4WA22.2|DCL2_ASPFU 319.00 319.00 318.95 319.00 320.00 320.00 320.00

Conserved Sites ('Positives')
Hit   SU MT(q:x) MT(q:e) MT(q:m) MT(s:x) MT(s:e) MT(s:m)
sp|P34529.2|DCR1_CAEEL 1845.00 1845.00 1845.00 1845.00 1845.00 1845.00 1845.00
sp|Q9VCU9.1|DCR1_DROME 993.00 991.50 991.84 993.00 1010.00 1010.00 1010.00
sp|Q9UPY3.2|DICER_HUMAN 1011.00 1011.00 1011.00 1011.00 1011.00 1011.00 1011.00
sp|Q8R418.2|DICER_MOUSE 1005.00 1005.00 1005.00 1005.00 1005.00 1005.00 1005.00
sp|P84634.1|DCL4_ARATH 518.00 518.00 518.00 518.00 518.00 518.00 518.00
sp|Q7S8J7.1|DCL1_NEUCR 659.00 659.00 660.70 659.00 602.33 602.65 609.00
sp|A4RKC3.2|DCL1_MAGGR 535.00 538.00 538.59 543.00 563.50 564.61 564.00
sp|Q0CW42.2|DCL1_ASPTN 616.00 622.00 619.70 628.00 608.50 611.79 613.00
sp|Q1DKI1.2|DCL1_COCIM 548.00 552.50 550.64 557.00 554.00 554.47 557.00
sp|A2RAF3.2|DCL1_ASPNC 445.00 441.00 441.96 445.00 457.00 457.00 457.00
sp|Q09884.1|DCR1_SCHPO 509.00 502.50 503.94 509.00 508.67 511.01 522.00
sp|A1CBC9.2|DCL1_ASPCL 550.00 542.00 541.71 550.00 525.17 525.26 534.00
sp|A1DE13.1|DCL1_NEOFI 516.00 517.00 518.34 518.00 518.00 518.91 521.00
sp|Q2VF19.1|DCL1_CRYPA 464.00 462.50 463.91 464.00 470.00 471.74 472.00
sp|Q0UI93.2|DCL1_PHANO 633.00 633.00 633.00 633.00 612.50 611.97 613.00
sp|Q4WVE3.3|DCL1_ASPFU 484.00 481.00 485.86 484.00 500.00 500.51 503.00
sp|Q2U6C4.2|DCL1_ASPOR 515.00 515.00 515.00 515.00 515.00 515.00 515.00
sp|Q2UNX5.1|DCL2_ASPOR 473.00 474.00 473.07 475.00 462.50 462.57 463.00
sp|Q4WA22.2|DCL2_ASPFU 529.00 529.50 530.12 530.00 532.00 532.00 532.00




More information about the Bioperl-l mailing list