[Bioperl-l] experimental Bio::Search::Tiling implementation
Steve Chervitz
sac at bioperl.org
Tue May 19 22:21:25 UTC 2009
Mark,
Great work. My SearchUtils tiling function has been lingering for far
too long (at least a decade).
Your comment about BLASTP is fitting. I was working almost exclusively
with BLASTP when developing the original tiling function and it seems
like the trouble ensued when using it with other blast flavors. There
was insufficient exploration of blast alignment edge cases. It would
be good to come up with a comprehensive collection of blast reports to
stress test your tiling impl. The set currently in t/data is a good
start, but may not be sufficient.
Cheers,
Steve
On Tue, May 19, 2009 at 12:31 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
> Hi All-
>
> With the frequent posts concerning HSP tilings, I thought it was time
> to create the sought-after Bio::Search::Tiling namespace, and attempt
> to provide a robust and exact tiling algorithm. I think it's timely,
> too, since Jason's usual remarks involve the use of wu-blast with
> the --links option, and wu-blast has recently turned commercial and
> is evidently costly to obtain.
>
> The namespace includes an abstract interface B:S:Tiling::TilingI, and
> a concrete class called B:S:Tiling::MapTiling. The object is
> constructed like so
>
> $tiling = Bio::Search::Tiling::MapTiling($your_blast_hit);
>
> and provides methods for identities(), conserved(), and length();
> other stats could also be provided. Identities and conserved sites are
> correctly estimated, accounting for multiple overlapping HSPs. There
> is also a method next_tiling($type), where $type is 'hit', 'subject'
> (alias for 'hit'), or 'query', which an iterator stepping through all
> minimal sets of HSPs that completely cover the 'hit' or 'query'
> sequence. One feature is that the individual tilings do not need to be
> generated to estimate the statistics; next_tiling provides the individual
> tilings only if you want/need them.
>
> I've made it available in a pre-alpha state on bioperl-dev. It's
> working and workable with plenty pod: see the synopses. It would
> be excellent if interested folks would try it out on their favorite
> data. Some niceties are not yet implemented, so BLASTP data is your
> best bet for success. Check it out via svn into a separate working
> directory, let me know if there are any questions.
>
> Below is table of comparison numbers using the current SearchUtils
> tiling implementation and some of the new methods, on some test data
> in t/data. Please see pod for many more details.
>
> Cheers,
> Mark
>
>
> *****
> Comparision of methods with (patched) Bio::Search::SearchUtils
> using test data t/data/dcr1_sp.WUBLASTP
>
> SU: SearchUtils
> MT: MapTiling, using methods 'exact', 'est', 'max'
> so MT(q:x) is MapTiling, stats calculated on the query, with the exact
> method, etc.
>
> Identities
> Hit SU MT(q:x) MT(q:e) MT(q:m) MT(s:x) MT(s:e) MT(s:m)
> sp|P34529.2|DCR1_CAEEL 1845.00 1845.00 1845.00 1845.00 1845.00 1845.00 1845.00
> sp|Q9VCU9.1|DCR1_DROME 668.00 664.50 666.26 668.00 678.00 678.00 678.00
> sp|Q9UPY3.2|DICER_HUMAN 706.00 706.00 706.00 706.00 706.00 706.00 706.00
> sp|Q8R418.2|DICER_MOUSE 698.00 698.00 698.00 698.00 698.00 698.00 698.00
> sp|P84634.1|DCL4_ARATH 341.00 341.00 341.00 341.00 341.00 341.00 341.00
> sp|Q7S8J7.1|DCL1_NEUCR 403.00 402.00 403.47 403.00 374.17 371.65 379.00
> sp|A4RKC3.2|DCL1_MAGGR 331.00 333.00 333.54 337.00 347.50 348.51 348.00
> sp|Q0CW42.2|DCL1_ASPTN 387.00 387.50 388.07 388.00 381.00 382.67 389.00
> sp|Q1DKI1.2|DCL1_COCIM 331.00 335.00 334.44 339.00 337.50 336.98 341.00
> sp|A2RAF3.2|DCL1_ASPNC 282.00 277.50 279.39 282.00 289.00 289.00 289.00
> sp|Q09884.1|DCR1_SCHPO 314.00 314.00 314.60 316.00 319.33 318.59 328.00
> sp|A1CBC9.2|DCL1_ASPCL 343.00 338.50 341.51 343.00 333.50 332.13 340.00
> sp|A1DE13.1|DCL1_NEOFI 339.00 342.50 343.06 346.00 348.00 348.45 351.00
> sp|Q2VF19.1|DCL1_CRYPA 284.00 284.00 284.78 284.00 288.00 289.00 290.00
> sp|Q0UI93.2|DCL1_PHANO 366.00 366.00 366.00 366.00 355.00 356.36 356.00
> sp|Q4WVE3.3|DCL1_ASPFU 313.00 310.00 314.71 315.00 328.00 328.45 331.00
> sp|Q2U6C4.2|DCL1_ASPOR 325.00 325.00 325.00 325.00 325.00 325.00 325.00
> sp|Q2UNX5.1|DCL2_ASPOR 282.00 282.50 282.74 283.00 275.50 275.86 276.00
> sp|Q4WA22.2|DCL2_ASPFU 319.00 319.00 318.95 319.00 320.00 320.00 320.00
>
> Conserved Sites ('Positives')
> Hit SU MT(q:x) MT(q:e) MT(q:m) MT(s:x) MT(s:e) MT(s:m)
> sp|P34529.2|DCR1_CAEEL 1845.00 1845.00 1845.00 1845.00 1845.00 1845.00 1845.00
> sp|Q9VCU9.1|DCR1_DROME 993.00 991.50 991.84 993.00 1010.00 1010.00 1010.00
> sp|Q9UPY3.2|DICER_HUMAN 1011.00 1011.00 1011.00 1011.00 1011.00 1011.00 1011.00
> sp|Q8R418.2|DICER_MOUSE 1005.00 1005.00 1005.00 1005.00 1005.00 1005.00 1005.00
> sp|P84634.1|DCL4_ARATH 518.00 518.00 518.00 518.00 518.00 518.00 518.00
> sp|Q7S8J7.1|DCL1_NEUCR 659.00 659.00 660.70 659.00 602.33 602.65 609.00
> sp|A4RKC3.2|DCL1_MAGGR 535.00 538.00 538.59 543.00 563.50 564.61 564.00
> sp|Q0CW42.2|DCL1_ASPTN 616.00 622.00 619.70 628.00 608.50 611.79 613.00
> sp|Q1DKI1.2|DCL1_COCIM 548.00 552.50 550.64 557.00 554.00 554.47 557.00
> sp|A2RAF3.2|DCL1_ASPNC 445.00 441.00 441.96 445.00 457.00 457.00 457.00
> sp|Q09884.1|DCR1_SCHPO 509.00 502.50 503.94 509.00 508.67 511.01 522.00
> sp|A1CBC9.2|DCL1_ASPCL 550.00 542.00 541.71 550.00 525.17 525.26 534.00
> sp|A1DE13.1|DCL1_NEOFI 516.00 517.00 518.34 518.00 518.00 518.91 521.00
> sp|Q2VF19.1|DCL1_CRYPA 464.00 462.50 463.91 464.00 470.00 471.74 472.00
> sp|Q0UI93.2|DCL1_PHANO 633.00 633.00 633.00 633.00 612.50 611.97 613.00
> sp|Q4WVE3.3|DCL1_ASPFU 484.00 481.00 485.86 484.00 500.00 500.51 503.00
> sp|Q2U6C4.2|DCL1_ASPOR 515.00 515.00 515.00 515.00 515.00 515.00 515.00
> sp|Q2UNX5.1|DCL2_ASPOR 473.00 474.00 473.07 475.00 462.50 462.57 463.00
> sp|Q4WA22.2|DCL2_ASPFU 529.00 529.50 530.12 530.00 532.00 532.00 532.00
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list