[Biopython-dev] Blast Parser error

christen Richard.Christen at unice.fr
Fri Dec 19 12:19:59 EST 2003


Hi there
I got a problem with the Blast parser

##############################
The biology thing :
I have been using the blast parser in some kind of a loop  to blast n
sequences against themselves, and then parse the output to build a distance
matrix. I use a sliding window to extract different parts of the n sequences
and, how stupid not to check it but I did not exepted, one sequence was much
shorter, so I sent to blast a sequence of zero length :-(

this is confirmed by the log of formatdb, blast thus provides only a warning
(note that formatdb does not return the proper lcl|id of the sequence ! (I
will send a mail to ncbi about that)
========================[ Dec 19, 2003  4:42 PM ]========================
Version 2.2.2 [Dec-14-2001]
Started database file "D:\Bases\Bac16S\BLAST\Chapon_4133-P"
WARNING: [000.000] lcl|50 has zero-length sequence

Formatted 90 sequences



As a result a got an error in the parser.


##############################
Error messages:
Traceback (most recent call last):
  File "test.py", line 24, in ?
    b_record = b_iter.next()  #recherche de la query suivante
  File "C:\Python23\lib\site-packages\Bio\Blast\NCBIStandalone.py", line
1331, in next
    return self._parser.parse(File.StringHandle(data))
  File "C:\Python23\lib\site-packages\Bio\Blast\NCBIStandalone.py", line
556, in parse
    self._scanner.feed(handle, self._consumer)
  File "C:\Python23\lib\site-packages\Bio\Blast\NCBIStandalone.py", line 98,
in feed
    self._scan_database_report(uhandle, consumer)
  File "C:\Python23\lib\site-packages\Bio\Blast\NCBIStandalone.py", line
422, in _scan_database_report
    line = safe_readline(uhandle)
  File "C:\Python23\lib\site-packages\Bio\ParserSupport.py", line 411, in
safe_readline
    raise SyntaxError, "Unexpected end of stream."

##############################
test.py sample
...
b_parser=NCBIStandalone.BlastParser()   # appel du parser
b_iter=NCBIStandalone.Iterator(blast_out, b_parser)  #appel de l'iterateur
...

23 while 1:
24     b_record = b_iter.next()  #recherche de la query suivante
25
26    if b_record is None:
27        break      #"plus de reponse Query= a lire...


##############################
blast output, section concerned
BLASTN 2.2.2 [Dec-14-2001]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= lcl|97633|sp=CYB 296
         (0 letters)

Database: D:\Bases\Bac16S\BLAST\Chapon_4133-P
           90 sequences; 14,728 total letters



 ***** No hits found ******

  Database: D:\Bases\Bac16S\BLAST\Chapon_4133-P
    Posted date:  Dec 19, 2003  4:42 PM
  Number of letters in database: 14,728
  Number of sequences in database:  90

BLASTN 2.2.2 [Dec-14-2001]



##############################
usefull pieces of code

def safe_readline(handle):
    """safe_readline(handle) -> line

    Read a line from an UndoHandle and return it.  If there are no more
    lines to read, I will raise a SyntaxError.

    """
    line = handle.readline()
    if not line:
        raise SyntaxError, "Unexpected end of stream."    #File
"C:\Python23\lib\site-packages\Bio\ParserSupport.py", line 411, in
safe_readline
    return line



        consumer.start_database_report()
 while 1:
            read_and_call(uhandle, consumer.database, start='  Database')
            # Database can span multiple lines.
            read_and_call_until(uhandle, consumer.database, start='
Posted')
            read_and_call(uhandle, consumer.posted_date, start='    Posted')
            read_and_call(uhandle, consumer.num_letters_in_database,
                       start='  Number of letters')
            read_and_call(uhandle, consumer.num_sequences_in_database,
                       start='  Number of sequences')
            read_and_call(uhandle, consumer.noevent, start='  ')
            line = safe_readline(uhandle)   #### NCBIStandalone.py", line
422, in _scan_database_report
            uhandle.saveline(line)
            if line.find('Lambda') != -1:
                       break


    def feed(self, handle, consumer):
        """S.feed(handle, consumer)

        Feed in a BLAST report for scanning.  handle is a file-like
        object that contains the BLAST report.  consumer is a Consumer
        object that will receive events as the report is scanned.

        """
        if isinstance(handle, File.UndoHandle):
            uhandle = handle
        else:
            uhandle = File.UndoHandle(handle)

        # Try to fast-forward to the beginning of the blast report.
        read_and_call_until(uhandle, consumer.noevent, contains='BLAST')
        # Now scan the BLAST report.
        self._scan_header(uhandle, consumer)
        self._scan_rounds(uhandle, consumer)
        self._scan_database_report(uhandle, consumer)
#######CBIStandalone.py", line 98, in feed
        self._scan_parameters(uhandle, consumer)




#######################################



Thanks in advance




Richard CHRISTEN
Champion de saut en epaisseur
UMR6543 CNRS - Université de Nice Sophia Antipolis

Centre de Biochimie
Parc Valrose
06108 Nice cedex2

tel  33 - 492 076 947
fax 33 - 492 076 408




More information about the Biopython-dev mailing list