[Bioperl-l] low complexity filter in StandAloneBlastPlus

Carnë Draug carandraug+dev at gmail.com
Thu Feb 13 22:52:10 UTC 2014


On 13 February 2014 22:38, Carnë Draug <carandraug+dev at gmail.com> wrote:
> On 13 February 2014 22:09, Paul Cantalupo <pcantalupo at gmail.com> wrote:
>> Hi Carne,
>>
>> Take a look at the synopsis of
>> Bio::Tools::Run::StandAloneBlastPlus::BlastMethods. I think you need to use
>> the method_args parameter:
>>
>>  $result = $fac->blastn( -query => 'query_seqs.fas',
>>                          -outfile => 'query.bls',
>>                          -method_args => [ '-dust' => 'no' ] );
>>
>
> Hi Paul
>
> thank for your reply but where do you see this documentation? This is
> neither in the last release [1], the bioperl-run repository [2], or
> the bioperl-live [3] (which doesn't even have ::BlastMethods).
>
> Also, I did what you say but get a "Blast run: parameter 'dust' is not
> available for method 'tblastn'" error.
>
> This is my simple code, from the very start (including creation of the
> database):
>
> Bio::Tools::Run::StandAloneBlastPlus->new(
>   -db_data => $db_data,
>   -db_name => $db_name,
>   -create => 1,
> )->make_db();
>
> my $fac = Bio::Tools::Run::StandAloneBlastPlus->new(
>   -db_name => $db_name,
> );
>
> foreach my $file (@files) {
>   my $result = $fac->tblastn(
>     -query   => $file,
>     -outfile => $file . '.bls',
>     -method_args => ['-dust' => 'no'],
>   );
>   ## do stuff with $result
> }
>
> Thank you
> Carnë
>
> [1] https://metacpan.org/pod/Bio::Tools::Run::StandAloneBlastPlus::BlastMethods
> [2] https://github.com/bioperl/bioperl-run/blob/master/lib/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm
> [3] https://github.com/bioperl/bioperl-live/blob/master/Bio/Tools/Run/StandAloneBlast.pm

For future reference, I just figured it out by reading blast+ manual
Table C3 [4] and some comments in the source of
Bio::Tools::Run::StandAloneBlastPlus. Basically, dustmasker is used
for nucleotides only and segmaster for protein. From your example, I
deduce that I could use "seg" to disable for protein sequences. The
following works fine and gives me the same results as disabling low
complexity filter in the NCBI web interface.

my $result = $fac->tblastn(
  -query   => $file,
  -outfile => $file . '.bls',
  -method_args => ['-seg' => 'no'],
);

Also, for future reference, note that blast+ default is "no"
(according to its manual), but bioperl's module changes it to "yes".
I'm guessing this is to use the same defaults as the NCBI web
interface.

Thank you all, one more time.

Carnë

[4] http://www.ncbi.nlm.nih.gov/books/NBK1763/




More information about the Bioperl-l mailing list