[Bioperl-l] BLAST parsing broken

Heikki Lehvaslaiho heikki.lehvaslaiho at gmail.com
Mon May 3 11:45:10 UTC 2010


Chris,

latest additions to Bio::SearchIO::blast.pm broke the parsing of normal
blast output.  $result->query_name returns now undef.

(Using the anonymous git now). This change still works:

commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74
Author: cjfields <cjfields at eb9725d8-4842-0410-9bbb-c0b52e2da49b>
Date:   Sun Dec 20 04:39:58 2009 +0000

    Robson's patch for buggy blastpgp output

But this does not:

commit 9a89c3434597104dd50553e3562983d78d14a544
Author: cjfields <cjfields at eb9725d8-4842-0410-9bbb-c0b52e2da49b>
Date:   Thu Apr 15 04:21:17 2010 +0000

    [bug 3031]

    patches for catching algorithm ref, courtesy Razi Khaja.

That makes it easy to find the diffs:

$git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74
9a89c3434597104dd50553e3562983d78d14a544   Bio/SearchIO/blast.pm
diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm
index 378023a..6f7eeeb 100644
--- a/Bio/SearchIO/blast.pm
+++ b/Bio/SearchIO/blast.pm
@@ -209,6 +209,7 @@ BEGIN {

         'BlastOutput_program'             => 'RESULT-algorithm_name',
         'BlastOutput_version'             => 'RESULT-algorithm_version',
+        'BlastOutput_algorithm-reference' => 'RESULT-algorithm_reference',
         'BlastOutput_query-def'           => 'RESULT-query_name',
         'BlastOutput_query-len'           => 'RESULT-query_length',
         'BlastOutput_query-acc'           => 'RESULT-query_accession',
@@ -504,6 +505,26 @@ sub next_result {
                 }
             );
         }
+        # parse the BLAST algorithm reference
+        elsif(/^Reference:\s+(.*)$/) {
+            # want to preserve newlines for the BLAST algorithm reference
+            my $algorithm_reference = "$1\n";
+            $_ = $self->_readline;
+            # while the current line, does not match an empty line, a RID:,
or a Database:, we are still looking at the
+            # algorithm_reference, append it to what we parsed so far
+            while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ /^Database:/) {
+                $algorithm_reference .= "$_";
+                $_ = $self->_readline;
+            }
+            # if we exited the while loop, we saw an empty line, a RID:, or
a Database:, so push it back
+            $self->_pushback($_);
+            $self->element(
+                {
+                    'Name' => 'BlastOutput_algorithm-reference',
+                    'Data' => $algorithm_reference
+                }
+            );
+        }
         # added Windows workaround for bug 1985
         elsif (/^(Searching|Results from round)/) {
             next unless $1 =~ /Results from round/;


I am not sure why reference parsing messes things up. Maybe it eats too many
lines from the result file.

Yours,

    -Heikki

Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
cell: +966 545 595 849  office: +966 2 808 2429

Computational Bioscience Research Centre (CBRC), Building #2, Office #4216
4700 King Abdullah University of Science and Technology (KAUST)
Thuwal 23955-6900, Kingdom of Saudi Arabia



More information about the Bioperl-l mailing list