[BioPython] BLAST/tutorial problems

Fri Jul 23 22:34:48 EDT 2004

I just did some tracing back and made sure I was up to date with the  
current CVS versions of all of biopython and my BLAST searches all seem  
to be working again, however the problem I was getting from my last  
email seems to be a problem in parsing my fasta file generated by my  
genbank query.

The code as follows:
--------------
file_for_blast = open(data_path_prefix + file_unique_id + 'fasta', 'r')
f_iterator = Fasta.Iterator(file_for_blast)
f_record = f_iterator.next()
--------------

should work according to the cook book, however when I call f_record =  
f_iterator.next() I get the following error:

--------------
Traceback (most recent call last):
   File "cluster-debug.py", line 110, in ?
     sys.stdout.flush()
   File "/usr/lib/python2.2/site-packages/Bio/Fasta/__init__.py", line  
72, in next
     result = self._iterator.next()
   File  
"/home/zschamm/bioinfo/biopython-1.30/build/lib.linux-i586-2.2/Martel/ 
IterParser.py", line 152, in iterateFile
   File  
"/home/zschamm/bioinfo/biopython-1.30/build/lib.linux-i586-2.2/Martel/ 
Parser.py", line 361, in parseString
   File "/usr/lib/python2.2/site-packages/_xmlplus/sax/handler.py", line  
38, in fatalError
     raise exception
Martel.Parser.ParserIncompleteException: error parsing at or beyond  
character 0 (unparsed text remains)
--------------

if I simply replace the last line with a protein direct sequence entry  
in the above code, my program continues along just fine and the BLAST  
portions now run.

f_record =  
"MNKRGKYTTLNLEEKMKVLSRIEAGRSLKSVMDEFGISKSTFYDIKKNKKLILDFVLKQDMPLVGAEKRKR 
TTGAKYGDVDDAVYMWYQQKRSAGVPVRGVELQAAAERFARCFGRTDFKASTGWLFRFRNRHAIGNRKGCGE 
QVLSSVSENVEPFRQKLSMIIKEEKLCLAQLYSGDETDLFWKSMPENSQASRKDICLPGKKINTERLSAFLC 
ANADGTHKLKSIIIGKSKLPKSVKEDTSTLPVIYKPSKDVWFTRELFSEWFFQNFVPEVRHFQLNVLRFHDE 
DVRALLLLDSCPAHPSSESLTSEDGRIKCMFFPHNSSTLIQPMNQGVILSCKRLYRWKQLEESLVIFEESDD 
EQEKGDKGVSKIKIYNIKSAIFNWAKSWEEVKQITIANAWENLLYKKEPEYDFQGLEHGDYREILEKCGELE 
TKLDDDRVWLNGDEEKGCLLKTKGGITKEVVQKGGEAEKQTAEFKLSAVRESLDYLLDFVDATPEFQRFHFT 
LCEFSDDS"

thanks,

Aaron

On Jul 23, 2004, at 3:35 PM, Jeffrey Chang wrote:

> Hi Aaron,
>
> NCBIDictionary requires parameters telling it what database to  
> retrieve from, and what format it uses.  I believe these changes were  
> made when we switched over to using NCBI EUtils API to retrieve  
> sequences.  Try doing:
> >>> ncbi_dict = GenBank.NCBIDictionary("nucleotide", "genbank",  
> parser=record_parser)
> >>> gb_record = ncbi_dict["6273291"]
> >>> print gb_record.seq
>
> I'm not sure why you are getting a timeout.  Is there a web proxy or  
> firewall blocking HTTP connections on your network?
>
> Jeff
>
>
>
> On Jul 23, 2004, at 3:23 PM, Aaron Zschau wrote:
>
>> Thanks for the help, though I am still having some trouble getting  
>> things working.  I am now getting a different timeout error:
>>
>>  [error] [client 10.0.0.22] (20507)The timeout specified has expired:  
>> ap_content_length_filter: apr_bucket_read() failed, referer:  
>> http://serval.atroxen.com:8080/interface.html
>>
>> I tried updating to version 1.3 of biopython (I was running the  
>> previous version) and now I get a type error:
>>
>>
>> Traceback (most recent call last):
>>   File "cluster-debug.py", line 88, in ?
>>     ncbi_dict = GenBank.NCBIDictionary(parser = record_parser)
>> TypeError: __init__() takes at least 3 non-keyword arguments (1 given)
>>
>> relating to this piece of code:
>>
>> record_parser = GenBank.FeatureParser()
>> ncbi_dict = GenBank.NCBIDictionary(parser = record_parser)
>> gb_record = ncbi_dict[gi_list[0]]
>>
>> this part worked just fine before the update to 1.3 and looking  
>> through the posted API I haven't been able to figure out what  
>> arguments are missing from the GenBank.NCBIDictionary creation.
>>
>> thanks
>>
>> Aaron
>>
>> On Jul 22, 2004, at 8:35 PM, Jeffrey Chang wrote:
>>
>>> On Jul 22, 2004, at 7:30 PM, Aaron Zschau wrote:
>>>
>>>> I recently started having my program hang during the part that does  
>>>> BLAST queries.
>>>
>>> Yes, NCBI has recently changed their BLAST server, and broke the  
>>> Biopython code.  It has been fixed now in the CVS version, which you  
>>> can retrieve at:
>>> http://cvs.biopython.org/
>>>
>>> Please grab the latest NCBIWWW.py file from there, and save it over  
>>> the older version.
>>>
>>> The major change in this version is that this blast code has been  
>>> deprecated in favor of the NCBI QBlast API, which should be more  
>>> stable.  Thus, after you install the new file, change the  
>>> NCBIWWW.blast call to NCBIWWW.qblast in your code.
>>>
>>> Jeff
>>