[Biopython-dev] Blast Parser error
christen
Richard.Christen at unice.fr
Fri Dec 19 12:19:59 EST 2003
Hi there
I got a problem with the Blast parser
##############################
The biology thing :
I have been using the blast parser in some kind of a loop to blast n
sequences against themselves, and then parse the output to build a distance
matrix. I use a sliding window to extract different parts of the n sequences
and, how stupid not to check it but I did not exepted, one sequence was much
shorter, so I sent to blast a sequence of zero length :-(
this is confirmed by the log of formatdb, blast thus provides only a warning
(note that formatdb does not return the proper lcl|id of the sequence ! (I
will send a mail to ncbi about that)
========================[ Dec 19, 2003 4:42 PM ]========================
Version 2.2.2 [Dec-14-2001]
Started database file "D:\Bases\Bac16S\BLAST\Chapon_4133-P"
WARNING: [000.000] lcl|50 has zero-length sequence
Formatted 90 sequences
As a result a got an error in the parser.
##############################
Error messages:
Traceback (most recent call last):
File "test.py", line 24, in ?
b_record = b_iter.next() #recherche de la query suivante
File "C:\Python23\lib\site-packages\Bio\Blast\NCBIStandalone.py", line
1331, in next
return self._parser.parse(File.StringHandle(data))
File "C:\Python23\lib\site-packages\Bio\Blast\NCBIStandalone.py", line
556, in parse
self._scanner.feed(handle, self._consumer)
File "C:\Python23\lib\site-packages\Bio\Blast\NCBIStandalone.py", line 98,
in feed
self._scan_database_report(uhandle, consumer)
File "C:\Python23\lib\site-packages\Bio\Blast\NCBIStandalone.py", line
422, in _scan_database_report
line = safe_readline(uhandle)
File "C:\Python23\lib\site-packages\Bio\ParserSupport.py", line 411, in
safe_readline
raise SyntaxError, "Unexpected end of stream."
##############################
test.py sample
...
b_parser=NCBIStandalone.BlastParser() # appel du parser
b_iter=NCBIStandalone.Iterator(blast_out, b_parser) #appel de l'iterateur
...
23 while 1:
24 b_record = b_iter.next() #recherche de la query suivante
25
26 if b_record is None:
27 break #"plus de reponse Query= a lire...
##############################
blast output, section concerned
BLASTN 2.2.2 [Dec-14-2001]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.
Query= lcl|97633|sp=CYB 296
(0 letters)
Database: D:\Bases\Bac16S\BLAST\Chapon_4133-P
90 sequences; 14,728 total letters
***** No hits found ******
Database: D:\Bases\Bac16S\BLAST\Chapon_4133-P
Posted date: Dec 19, 2003 4:42 PM
Number of letters in database: 14,728
Number of sequences in database: 90
BLASTN 2.2.2 [Dec-14-2001]
##############################
usefull pieces of code
def safe_readline(handle):
"""safe_readline(handle) -> line
Read a line from an UndoHandle and return it. If there are no more
lines to read, I will raise a SyntaxError.
"""
line = handle.readline()
if not line:
raise SyntaxError, "Unexpected end of stream." #File
"C:\Python23\lib\site-packages\Bio\ParserSupport.py", line 411, in
safe_readline
return line
consumer.start_database_report()
while 1:
read_and_call(uhandle, consumer.database, start=' Database')
# Database can span multiple lines.
read_and_call_until(uhandle, consumer.database, start='
Posted')
read_and_call(uhandle, consumer.posted_date, start=' Posted')
read_and_call(uhandle, consumer.num_letters_in_database,
start=' Number of letters')
read_and_call(uhandle, consumer.num_sequences_in_database,
start=' Number of sequences')
read_and_call(uhandle, consumer.noevent, start=' ')
line = safe_readline(uhandle) #### NCBIStandalone.py", line
422, in _scan_database_report
uhandle.saveline(line)
if line.find('Lambda') != -1:
break
def feed(self, handle, consumer):
"""S.feed(handle, consumer)
Feed in a BLAST report for scanning. handle is a file-like
object that contains the BLAST report. consumer is a Consumer
object that will receive events as the report is scanned.
"""
if isinstance(handle, File.UndoHandle):
uhandle = handle
else:
uhandle = File.UndoHandle(handle)
# Try to fast-forward to the beginning of the blast report.
read_and_call_until(uhandle, consumer.noevent, contains='BLAST')
# Now scan the BLAST report.
self._scan_header(uhandle, consumer)
self._scan_rounds(uhandle, consumer)
self._scan_database_report(uhandle, consumer)
#######CBIStandalone.py", line 98, in feed
self._scan_parameters(uhandle, consumer)
#######################################
Thanks in advance
Richard CHRISTEN
Champion de saut en epaisseur
UMR6543 CNRS - Université de Nice Sophia Antipolis
Centre de Biochimie
Parc Valrose
06108 Nice cedex2
tel 33 - 492 076 947
fax 33 - 492 076 408
More information about the Biopython-dev
mailing list