[BioRuby] Wu-blast report parsing issue
Naohisa GOTO
ngoto at gen-info.osaka-u.ac.jp
Thu Aug 9 16:15:45 UTC 2007
Hello,
I'm sorry it's too late.
It seems this error occurred in the line 29 of your xml file
<Hit_def></Hit_def>
The content of the Hit_def is empty.
For sequences with no definition, NCBI BLAST outputs
<Hit_def>No definition line found</Hit_def>
and the content of the Hit_def is not empty.
This means the output of WU-BLAST xml is sometimes
incompatible with the NCBI BLAST.
However, because this is very small difference,
I think this can be covered with BioRuby.
I can repeat the same error with the following data:
(saved as database.fst)
--------------------------------------------------------------
>lcl|EXAMPLE
AGACATAACCCAAACAGAATAACCTGAAAGAGACCCACGACCATGCAGGGGACCTGGATG
GTGCTGTTGGCACTGATATTGGGCACCTTCGGGGAGCTTGCTATGGCCTTACAGTGCTAC
ACCTGTGCGAATCCTGTGAGTGCATCCAACTGTGTCACCACCACCCACTGCCACATCAAT
GAAACCATGTGCAAGACTACGCTCTACTCCCTGGAGATTGTTTTCCCTTTCCTGGGGGAC
TCCACGGTGACCAAGTCCTGCGCCAGCAAGTGTGAGCCTTCGGATGTGGATGGCATTGGG
CAAACCCGGCCAGTGTCCTGCTGCAATTCTGACCTATGCAACGTGGATGGGGCACCCAGC
CTGGGCAGTCCTGGTGGCCTGCTCCTTGCCCTGGCACTTTTCTTGCTCTTGGGTGTCCTG
CTGTAAAGCCATGGCCATCTAGCTCCACTCCCTTGTCCCTGACATCCCAGTTCCCTAATG
CCTAGAAGAAATACAATGGCCATCTGC
--------------------------------------------------------------
(saved as query.fst)
--------------------------------------------------------------
>Contig1
AGACATAACCCAAACAGAATAACCTGAAAGAGACCCACGACCATGCAGGGGACCTGGATG
GTGCTGTTGGCACTGATATTGGGCACCTTCGGGGAGCTTGCTATGGCCTTACAGTGCTAC
ACCTGTGCGAATCCTGTGAGTGCATCCAACTGTGTCACCACCACCCACTGCCACATCAAT
GAAACCATGTGCAAGACTACGCTCTACTCCCTGGAGATTGTTTTCCCTTTCCTGGGGGAC
TCCACGGTGACCAAGTCCTGCGCCAGCAAGTGTGAGCCTTCGGATGTGGATGGCATTGGG
CAAACCCGGCCAGTGTCCTGCTGCAATTCTGACCTATGCAACGTGGATGGGGCACCCAGC
CTGGGCAGTCCTGGTGGCCTGCTCCTTGCCCTGGCACTTTTCTTGCTCTTGGGTGTCCTG
CTGTAAAGCCATGGCCATCTAGCTCCACTCCCTTGTCCCTGACATCCCAGTTCCCTAATG
CCTAGAAGAAATACAATGGCCATCTGC
--------------------------------------------------------------
The sequence of query.fst is completely the same as database.fst.
Only definition line is different.
commands for WU BLAST:
% xdformat -n database.fst
% wu-blastall -p blastn -i query.fst -d database.fst \
-o wu-blastn.xml -e 1e-10 -m 7 -F F
commands for NCBI BLAST:
% formatdb -i database.fst -p F -o
% blastall -p blastn -i query.fst -d database.fst \
-o ncbi-blastn.xml -e 1e-10 -m 7 -F F
Report of WU BLAST:
<Hit_num>1</Hit_num>
<Hit_id>lcl|EXAMPLE</Hit_id>
<Hit_def></Hit_def>
<Hit_accession>EXAMPLE</Hit_accession>
<Hit_len>507</Hit_len>
Report of NCBI BLAST:
<Hit_num>1</Hit_num>
<Hit_id>lcl|EXAMPLE</Hit_id>
<Hit_def>No definition line found</Hit_def>
<Hit_accession>EXAMPLE</Hit_accession>
<Hit_len>507</Hit_len>
The Hit_def line of WU-BLAST is incompatible with NCBI BLAST
for sequences with no definitions.
The versions of WU BLAST and NCBI BLAST were:
2.0MP-WashU [04-May-2006] [linux26-i786-ILP32F64 2006-05-09T12:19:58]
blastn 2.2.15 [Oct-15-2006]
> I've tried feeding my script normal (-m0) wublast output too. It
> doesn't crash - but @reportsArray.length == 0).
Bio::Blast.reports can only be used for XML output.
For normal format,
49: @reportsArray = Bio::FlatFile.new(nil, @file).to_a
would work.
Thank you,
Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org
On Tue, 24 Apr 2007 21:03:56 +0200
Yannick Wurm <Yannick.Wurm at unil.ch> wrote:
> Hello,
>
> I've generated a blast report using wu-blastall with -m7 to get xml
> output.
> It should be easy to get this into ruby, but I'm having a hard time.
>
> Here's the error I get:
> #~/ruby/dotGraphOfStrongHits.rb simple.xml simple.xml.dot 1.0e-5
> /sw/lib/ruby/site_ruby/1.8/bio/appl/blast/xmlparser.rb:158:in
> `clone': can't clone NilClass (TypeError)
> from /sw/lib/ruby/site_ruby/1.8/bio/appl/blast/xmlparser.rb:
> 158:in `xmlparser_parse_hit'
> from /sw/lib/ruby/site_ruby/1.8/bio/appl/blast/xmlparser.rb:
> 72:in `xmlparser_parse'
> from /sw/lib/ruby/site_ruby/1.8/bio/appl/blast/xmlparser.rb:
> 41:in `xmlparser_parse'
> from /sw/lib/ruby/site_ruby/1.8/bio/appl/blast/report.rb:
> 66:in `auto_parse'
> from /sw/lib/ruby/site_ruby/1.8/bio/appl/blast/report.rb:
> 89:in `initialize'
> from /sw/lib/ruby/site_ruby/1.8/bio/appl/blast.rb:115:in
> `reports'
> from /sw/lib/ruby/site_ruby/1.8/bio/appl/blast.rb:109:in
> `reports'
> from /Users/yannickwurm/ruby/wublastReportParser.rb:49:in
> `loadBlastReport'
> from /Users/yannickwurm/ruby/dotGraphOfStrongHits.rb:30:in
> `parseFile'
> from /Users/yannickwurm/ruby/dotGraphOfStrongHits.rb:61
>
> The corresponding lines of wublastReportParser.rb are:
> 48: @file = File.open(@blast_report, IO::RDONLY)
> 49: @reportsArray = Bio::Blast.reports(@file)
>
>
> Does wublast not respect the standard blast xml output?
> I've tried feeding my script normal (-m0) wublast output too. It
> doesn't crash - but @reportsArray.length == 0).
>
> My xml-ed blast report is here:
> http://wwwpeople.unil.ch/yannick.wurm/simple.xml
>
>
> What am I doing wrong? Do you have ideas how to solve this issue?
>
>
> My version info:
> wu-blastall 2.2.6
> ruby 1.8.4 (2005-12-24) [powerpc-darwin]
> bio.rb,v 1.84 2007/04/05
>
> Thanks in advance for any pointers!
> yannick
>
> --------------------------------------------
> yannick . wurm @ unil . ch
> Ant Genomics, Ecology & Evolution @ Lausanne
> http://www.unil.ch/dee/page28685_fr.html
>
>
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
More information about the BioRuby
mailing list