[Bioperl-l] bp_classify_hits_kingdom.pl

Jason Stajich jason.stajich at gmail.com
Wed Jun 22 20:07:54 UTC 2011


Bernd - oops - thanks very much for noticing this - I was too fast in copy & paste. I see another typo in there now that the midday light is shining on the code that I'll fix.

Should be able to check in this in a second.

Jason

On Jun 22, 2011, at 11:07 AM, Bernd Web wrote:

> Hi Jason,
> 
> I did GI to TAX mapping in Perl alone. Nice to know this script
> exists. Thanks for this.
> Just one question, I noticed on
> https://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/classify_hits_kingdom.PLS:
> 
> line 96: my $dbh = tie(%gi2node, 'DB_File', 'gi2class');
> and
> line 100: my $dbh2 = my $dbh =
> DBI->connect("dbi:SQLite:dbname=$giidxfile","","");
> 
> So  the second $dbh masks earlier declaration.
> 
> 
> Cheers,
> Bernd
> 
> On Wed, Jun 22, 2011 at 6:04 PM, Jason Stajich <jason.stajich at gmail.com> wrote:
>> Hi Dan
>> 
>> Looks like the mformat 6 is the right one for the blastplus toolkit - it is m8 or m9 for the C toolkit blastall application.
>> 
>> I think DB_File was falling over with the now 40M+ gi to taxid pairs that I think were overwhelming DB_File and the berkkeleyDB implementation there.
>> 
>> To solve it I reimplemented it with SQLite -- which will require you to install DBD::SQLite.
>> 
>> I've checked in code to the main trunk in the github repo if you want to take a look -- you can either download the file https://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/classify_hits_kingdom.PLS or check it out via git (recommended).
>> 
>> -jason
>> On Jun 21, 2011, at 7:25 AM, Jackson, Daniel wrote:
>> 
>>> Hi Jason,
>>> 
>>> My name is Dan and I'm hoping to use your bioperl script bp_classify_hits_kingdom.pl to categorise some ESTs I recently acquired. I've been stuck on this problem for days now - can you help?!? I suspect it's an easy and obvious solution.... I'm not a complete newbie to using scripts, but wouldn't say I'm experienced! I've just installed Bioperl and have generated a small BLASTx test file of my sequences searched against a local installation of GenBank's nr database. The BLAST search was run locally as follows:
>>> 
>>> gzgbio-48:~ djackson$ blastx -query /Users/djackson/Desktop/10_Vaceltia_contigs_fna.txt -db /Users/djackson/BLAST-2.2.25+/db/nr/nr -outfmt 6 -out /Users/djackson/Desktop/BLASTx_10_Vaceltia_contigs_m6.txt -numthreads 2 -evalue .00001 -show_gis -num_descriptions 10 -num_alignments 10 -max_target_seqs 10
>>> 
>>> 
>>> The results of this file are attached (BLASTx_10_Vaceltia_contigs_m6.txt). I realise the BLASTx output is supposed to be in -outfmt 8 or -outfmt 9, but providing these files to bp_classify_hits_kingdom.pl generates the following error:
>>> 
>>> gzgbio-48:~ djackson$ bp_classify_hits_kingdom.pl -t /Users/djackson/taxdump -g /Users/djackson/taxdump/gi_taxid_prot.dmp -i /Users/djackson/Desktop/BLASTx_10_Vaceltia_contigs_m9.txt  -e .0001
>>> /Users/djackson/Desktop/BLASTx_10_Vaceltia_contigs_m9.txt
>>> Use of uninitialized value $evalue in numeric gt (>) at /usr/local/bin/bp_classify_hits_kingdom.pl line 143, <$fh> line 1.
>>> Use of uninitialized value $evalue in numeric gt (>) at /usr/local/bin/bp_classify_hits_kingdom.pl line 143, <$fh> line 1.
>>> Use of uninitialized value $hname in pattern match (m//) at /usr/local/bin/bp_classify_hits_kingdom.pl line 148, <$fh> line 1.
>>> Use of uninitialized value $hname in concatenation (.) or string at /usr/local/bin/bp_classify_hits_kingdom.pl line 195, <$fh> line 1.
>>> no GI in
>>> Use of uninitialized value $evalue in numeric gt (>) at /usr/local/bin/bp_classify_hits_kingdom.pl line 143, <$fh> line 2.
>>> Use of uninitialized value $evalue in numeric gt (>) at /usr/local/bin/bp_classify_hits_kingdom.pl line 143, <$fh> line 2.
>>> Use of uninitialized value $hname in pattern match (m//) at /usr/local/bin/bp_classify_hits_kingdom.pl line 148, <$fh> line 2.
>>> Use of uninitialized value $hname in concatenation (.) or string at /usr/local/bin/bp_classify_hits_kingdom.pl line 195, <$fh> line 2.
>>> no GI in
>>> Use of uninitialized value $evalue in numeric gt (>) at /usr/local/bin/bp_classify_hits_kingdom.pl line 143, <$fh> line 3.
>>> Use of uninitialized value $evalue in numeric gt (>) at /usr/local/bin/bp_classify_hits_kingdom.pl line 143, <$fh> line 3.
>>> .
>>> .
>>> etc...
>>> .
>>> .
>>> no GI in
>>> /Users/djackson/Desktop/BLASTx_10_Vaceltia_contigs_m9 total=1182
>>>                     1182 100.00%
>>> gzgbio-48:~ djackson$
>>> 
>>> 
>>> Providing the bp_classify_hits_kingdom.pl script with an -outfmt 6 format seems to get closer to a meaningful output, but still generates the following error:
>>> 
>>> 
>>> gzgbio-48:~ djackson$ bp_classify_hits_kingdom.pl -t /Users/djackson/taxdump -g /Users/djackson/taxdump/gi_taxid_prot.dmp -i /Users/djackson/Desktop/BLASTx_10_Vaceltia_contigs_m6.txt  -e .0001 -v
>>> /Users/djackson/Desktop/BLASTx_10_Vaceltia_contigs_m6.txt
>>> no taxid for 51127506
>>> no taxid for 51127506
>>> no taxid for 51127506
>>> no taxid for 317419045
>>> no taxid for 47219014
>>> .
>>> .
>>> etc...
>>> .
>>> .
>>> /Users/djackson/Desktop/BLASTx_10_Vaceltia_contigs_m6 total=10
>>>                     10 100.00%
>>> gzgbio-48:~ djackson$
>>> 
>>> 
>>> Kind regards and thanks in advance,
>>> Dan
>>> 
>>> <BLASTx_10_Vaceltia_contigs_m6.txt>
>>> 
>>> 
>>> ---------------------------------------------------------------
>>> Junior Professor Daniel J. Jackson
>>> Courant Research Centre Geobiology
>>> Georg-August University of Göttingen
>>> Goldschmidtstr.3
>>> 37077 Göttingen
>>> Germany
>>> 
>>> Tel: +49 (0) 551 39 14177
>>> Fax: +49 (0) 551 39 7918
>>> 
>>> djackso at uni-goettingen.de
>>> http://www.uni-goettingen.de/en/102705.html
>>> ---------------------------------------------------------------
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 





More information about the Bioperl-l mailing list