[BioPython] BLAST/tutorial problems
Aaron Zschau
aaron at ocelot-atroxen.dyndns.org
Fri Jul 23 22:34:48 EDT 2004
I just did some tracing back and made sure I was up to date with the
current CVS versions of all of biopython and my BLAST searches all seem
to be working again, however the problem I was getting from my last
email seems to be a problem in parsing my fasta file generated by my
genbank query.
The code as follows:
--------------
file_for_blast = open(data_path_prefix + file_unique_id + 'fasta', 'r')
f_iterator = Fasta.Iterator(file_for_blast)
f_record = f_iterator.next()
--------------
should work according to the cook book, however when I call f_record =
f_iterator.next() I get the following error:
--------------
Traceback (most recent call last):
File "cluster-debug.py", line 110, in ?
sys.stdout.flush()
File "/usr/lib/python2.2/site-packages/Bio/Fasta/__init__.py", line
72, in next
result = self._iterator.next()
File
"/home/zschamm/bioinfo/biopython-1.30/build/lib.linux-i586-2.2/Martel/
IterParser.py", line 152, in iterateFile
File
"/home/zschamm/bioinfo/biopython-1.30/build/lib.linux-i586-2.2/Martel/
Parser.py", line 361, in parseString
File "/usr/lib/python2.2/site-packages/_xmlplus/sax/handler.py", line
38, in fatalError
raise exception
Martel.Parser.ParserIncompleteException: error parsing at or beyond
character 0 (unparsed text remains)
--------------
if I simply replace the last line with a protein direct sequence entry
in the above code, my program continues along just fine and the BLAST
portions now run.
f_record =
"MNKRGKYTTLNLEEKMKVLSRIEAGRSLKSVMDEFGISKSTFYDIKKNKKLILDFVLKQDMPLVGAEKRKR
TTGAKYGDVDDAVYMWYQQKRSAGVPVRGVELQAAAERFARCFGRTDFKASTGWLFRFRNRHAIGNRKGCGE
QVLSSVSENVEPFRQKLSMIIKEEKLCLAQLYSGDETDLFWKSMPENSQASRKDICLPGKKINTERLSAFLC
ANADGTHKLKSIIIGKSKLPKSVKEDTSTLPVIYKPSKDVWFTRELFSEWFFQNFVPEVRHFQLNVLRFHDE
DVRALLLLDSCPAHPSSESLTSEDGRIKCMFFPHNSSTLIQPMNQGVILSCKRLYRWKQLEESLVIFEESDD
EQEKGDKGVSKIKIYNIKSAIFNWAKSWEEVKQITIANAWENLLYKKEPEYDFQGLEHGDYREILEKCGELE
TKLDDDRVWLNGDEEKGCLLKTKGGITKEVVQKGGEAEKQTAEFKLSAVRESLDYLLDFVDATPEFQRFHFT
LCEFSDDS"
thanks,
Aaron
On Jul 23, 2004, at 3:35 PM, Jeffrey Chang wrote:
> Hi Aaron,
>
> NCBIDictionary requires parameters telling it what database to
> retrieve from, and what format it uses. I believe these changes were
> made when we switched over to using NCBI EUtils API to retrieve
> sequences. Try doing:
> >>> ncbi_dict = GenBank.NCBIDictionary("nucleotide", "genbank",
> parser=record_parser)
> >>> gb_record = ncbi_dict["6273291"]
> >>> print gb_record.seq
>
> I'm not sure why you are getting a timeout. Is there a web proxy or
> firewall blocking HTTP connections on your network?
>
> Jeff
>
>
>
> On Jul 23, 2004, at 3:23 PM, Aaron Zschau wrote:
>
>> Thanks for the help, though I am still having some trouble getting
>> things working. I am now getting a different timeout error:
>>
>> [error] [client 10.0.0.22] (20507)The timeout specified has expired:
>> ap_content_length_filter: apr_bucket_read() failed, referer:
>> http://serval.atroxen.com:8080/interface.html
>>
>> I tried updating to version 1.3 of biopython (I was running the
>> previous version) and now I get a type error:
>>
>>
>> Traceback (most recent call last):
>> File "cluster-debug.py", line 88, in ?
>> ncbi_dict = GenBank.NCBIDictionary(parser = record_parser)
>> TypeError: __init__() takes at least 3 non-keyword arguments (1 given)
>>
>> relating to this piece of code:
>>
>> record_parser = GenBank.FeatureParser()
>> ncbi_dict = GenBank.NCBIDictionary(parser = record_parser)
>> gb_record = ncbi_dict[gi_list[0]]
>>
>> this part worked just fine before the update to 1.3 and looking
>> through the posted API I haven't been able to figure out what
>> arguments are missing from the GenBank.NCBIDictionary creation.
>>
>> thanks
>>
>> Aaron
>>
>> On Jul 22, 2004, at 8:35 PM, Jeffrey Chang wrote:
>>
>>> On Jul 22, 2004, at 7:30 PM, Aaron Zschau wrote:
>>>
>>>> I recently started having my program hang during the part that does
>>>> BLAST queries.
>>>
>>> Yes, NCBI has recently changed their BLAST server, and broke the
>>> Biopython code. It has been fixed now in the CVS version, which you
>>> can retrieve at:
>>> http://cvs.biopython.org/
>>>
>>> Please grab the latest NCBIWWW.py file from there, and save it over
>>> the older version.
>>>
>>> The major change in this version is that this blast code has been
>>> deprecated in favor of the NCBI QBlast API, which should be more
>>> stable. Thus, after you install the new file, change the
>>> NCBIWWW.blast call to NCBIWWW.qblast in your code.
>>>
>>> Jeff
>>
More information about the BioPython
mailing list