From rhf22 at mole.bio.cam.ac.uk Mon Dec 1 08:17:06 2003 From: rhf22 at mole.bio.cam.ac.uk (Rasmus Fogh) Date: Sat Mar 5 14:43:29 2005 Subject: [Biopython-dev] ScrioptCentral Message-ID: Hi, I have written you before about possibly joining up to the BioPython project. I am from the CCPN project (http://www.ccpn.ac.uk/index.html). We do not think we could do that at the moment, but we would be interested in putting a link to our web page in the ScriptCentral, if possible. Clearly I need to join up, somehow, to get a userID and password. Do you think this would be possible? How should I proceed? Thanks, Rasmus --------------------------------------------------------------------------- Dr. Rasmus H. Fogh Email: r.h.fogh@bioc.cam.ac.uk Dept. of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK. FAX (01223)766002 From chapmanb at uga.edu Mon Dec 1 09:10:37 2003 From: chapmanb at uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:29 2005 Subject: [Biopython-dev] ScriptCentral In-Reply-To: References: Message-ID: <20031201141037.GA95612@evostick.agtec.uga.edu> Hi Rasmus; > I have written you before about possibly joining up to the BioPython > project. I am from the CCPN project (http://www.ccpn.ac.uk/index.html). > > We do not think we could do that at the moment, but we would be interested > in putting a link to our web page in the ScriptCentral, if possible. > Clearly I need to join up, somehow, to get a userID and password. That would be great. The ScriptCentral page is editable from the web with the username 'biopython' and the password 'user' (no quotes on both). From there you can click 'Edit this page' and then 'Add New' and you can go forward and enter the contact information about the page. If you have any problems at all, feel free to send your information (Name, Author, URL and Description) to me and I'd be happy to add it. Thanks! Hope this helps. Brad From jpaint at u.washington.edu Mon Dec 1 13:42:54 2003 From: jpaint at u.washington.edu (Jay Painter) Date: Sat Mar 5 14:43:29 2005 Subject: [Biopython-dev] mmLib 0.5 Released Message-ID: <1070304174.3736.6.camel@d-128-95-235-174.dhcp4.washington.edu> Hello, I have just released a new version of mmLib (full description below). This version includes a new monomer library based on the RCSB's standard component library, classes for unit cell calculations, a full space group library, and a enhanced GUI mmCIF editor written with the PyGTK toolkit bindings. Regards, Jay Painter The Python Macromolecular Library (mmLib) is a software toolkit and library of routines for the analysis and manipulation of macromolecular structural models, implemented in the Python programming language. It is accessed via a layered, object-oriented application programming interface, and provides a range of useful software components for parsing mmCIF, PDB, and MTZ files, a library of atomic elements and monomers, an object-oriented data structure describing biological macromolecules, and an OpenGL molecular viewer. The mmLib data model is designed to provide easy access to the various levels of detail needed to implement high-level application programs for macromolecular crystallography, NMR, modeling, and visualization. This includes specialized classes for proteins, DNA, amino acids, and nucleic acids. Also included is a extensive monomer library, element library, and specialized classes for performing unit cell calculations combined with a full space group library. From idoerg at burnham.org Fri Dec 12 14:28:59 2003 From: idoerg at burnham.org (Iddo Friedberg) Date: Sat Mar 5 14:43:29 2005 Subject: [Biopython-dev] Error parsing a genBank file Message-ID: <3FDA16FB.2000209@burnham.org> Hi, I am getting an error on parsing a GenBank file. It's the E. coli K12 Genome, downloaded from: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=NC_000913 Egads! How do I turn Martel's debug feature on? Setting it to level 2 in FeatureParser didn't seem to do much... Thanks, Iddo >>> gb_iter = GenBank.Iterator(open('ecoli_k12.gb'),fp) >>> fp = GenBank.FeatureParser() >>> cur_rec = gb_iter.next() Traceback (most recent call last): File "", line 1, in ? File "/home/iddo/biopy_cvs/biopython/Bio/GenBank/__init__.py", line 142, in next return self._parser.parse(File.StringHandle(data)) File "/usr/home/iddo/biopy_cvs/biopython/Bio/GenBank/__init__.py", line 229, in parse self._scanner.feed(handle, self._consumer) File "/usr/home/iddo/biopy_cvs/biopython/Bio/GenBank/__init__.py", line 1251, in feed File "/usr/home/iddo/biopy_cvs/biopython/Martel/Parser.py", line 328, in parseFile self.parseString(fileobj.read()) File "/usr/home/iddo/biopy_cvs/biopython/Martel/Parser.py", line 356, in parseString self._err_handler.fatalError(result) File "/usr/lib/python2.2/site-packages/_xmlplus/sax/handler.py", line 38, in fatalError raise exception Martel.Parser.ParserPositionException: error parsing at or beyond character 1414769 -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 646 3171 http://ffas.ljcrf.edu/~iddo From idoerg at burnham.org Fri Dec 12 14:51:04 2003 From: idoerg at burnham.org (Iddo Friedberg) Date: Sat Mar 5 14:43:29 2005 Subject: [Biopython-dev] SORRY! Message-ID: <3FDA1C28.4030200@burnham.org> Re: my previous message. Not a bug. Needed the bleeding edge CVS version. Everything's hunky-dory. ./I -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 646 3171 http://ffas.ljcrf.edu/~iddo From chapmanb at uga.edu Fri Dec 12 14:53:45 2003 From: chapmanb at uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:29 2005 Subject: [Biopython-dev] Error parsing a genBank file In-Reply-To: <3FDA16FB.2000209@burnham.org> References: <3FDA16FB.2000209@burnham.org> Message-ID: <20031212195345.GF6895@evostick.agtec.uga.edu> Hi Iddo; > I am getting an error on parsing a GenBank file. It's the E. coli K12 > Genome, downloaded from: > > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=NC_000913 It parses for me with the current CVS. Peter added the /selenocysteine tag just a week ago, and yup it looks like your fine has this tag. So yeah, just update to CVS and all should be good. Ah, GenBank and their continuously expanding tag list. > Egads! How do I turn Martel's debug feature on? Setting it to level 2 in > FeatureParser didn't seem to do much... Huh? Now you are really confusing me. The following code: from Bio import GenBank fp = GenBank.FeatureParser(debug_level = 2) gb_iter = GenBank.Iterator(open('ecoli_k12.gb'), fp) gb_iter.next() spits out tons to debug messages ala: Match ' /EC_numb' (x=8978): ' ' Match ' /EC_numbe' (x=8979): '\\/' Match ' /EC_number' (x=8980): '[^"]' Match ' /EC_number=' (x=8981): '[^"]' It does take a few moments before the debug info gets printed though. Just wait and you'll see all the magic. Hope this helps. Why-can't-anyone-parse-small-little-files-anymore-ly yr's, Brad > >>> gb_iter = GenBank.Iterator(open('ecoli_k12.gb'),fp) > >>> fp = GenBank.FeatureParser() > >>> cur_rec = gb_iter.next() > Traceback (most recent call last): > File "", line 1, in ? > File "/home/iddo/biopy_cvs/biopython/Bio/GenBank/__init__.py", line > 142, in next > return self._parser.parse(File.StringHandle(data)) > File "/usr/home/iddo/biopy_cvs/biopython/Bio/GenBank/__init__.py", > line 229, in parse > self._scanner.feed(handle, self._consumer) > File "/usr/home/iddo/biopy_cvs/biopython/Bio/GenBank/__init__.py", > line 1251, in feed > > File "/usr/home/iddo/biopy_cvs/biopython/Martel/Parser.py", line 328, > in parseFile > self.parseString(fileobj.read()) > File "/usr/home/iddo/biopy_cvs/biopython/Martel/Parser.py", line 356, > in parseString > self._err_handler.fatalError(result) > File "/usr/lib/python2.2/site-packages/_xmlplus/sax/handler.py", line > 38, in fatalError > raise exception > Martel.Parser.ParserPositionException: error parsing at or beyond > character 1414769 > > > > > > > > -- > Iddo Friedberg, Ph.D. > The Burnham Institute > 10901 N. Torrey Pines Rd. > La Jolla, CA 92037 > USA > Tel: +1 (858) 646 3100 x3516 > Fax: +1 (858) 646 3171 > http://ffas.ljcrf.edu/~iddo > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev@biopython.org > http://biopython.org/mailman/listinfo/biopython-dev From rgb2003 at med.cornell.edu Tue Dec 16 22:59:34 2003 From: rgb2003 at med.cornell.edu (Robert G. Bussell) Date: Sat Mar 5 14:43:29 2005 Subject: [Biopython-dev] Contribution -- NMR xpk files Message-ID: Hello, I would like to contribute some tools that I have developed to the biopython project. Among them are programs for analyzing NMR data as well as modules suited for more general problems of handling resonance assignment data. The tool set that I would like to contribue deals with with structural NMR resonance assignment data in read from a standard, easily generated file format (xpk or peaklist files) recognized by nmrview, a freely distributed, open source program with a large user base. The code that I have written facilitates the process of extracting scientifically relevant information from the data. Of additional utility is the module upon which the main program depends which can be used as a building block for constructing code to deal with nmrview peaklists and other resonance assignment data read in from some other file format. I'd be thrilled to introduce these tools into the public domain through biopython and to provide more, related tools as feedback warrants them and as I understand more about where they fit into the biopython project. Thanks to everybody who supports the biopython and other open source projects and I look forward to learning from your comments, testing and suggestions. Sincerely, Robert Bussell, Jr. rgb2003@med.cornell.edu -------------- Program and instructions ------------- I would appreciate your feedback and testing. Here are some instructions for setting things up on your computer. Please let me know if you use these tools even if you don't have a specific comment. NOTE: At the moment the program and modules do not depend on biopython in any way. It should only be necessary to have python installed to run them. (1) Create or locate an existing directory for the modules I provide below. Copy all three of the modules (readtools, writetools and xpktools) into that directory. (2) Edit the line in predictnoe2.py that reads sys.path=[sys.path,"/usr/people/robert/bin/python/lib"] so that it points your local directory that contains the modules listed in step 1. (3) The first line of the program #! /usr/bin/env python may have to be modified to point to the python interpreter on your computer. (4) Cut and paste the input peaklist (noed.xpk) from below (or use your own data). You will need to specify this file location as an input parameter to predict (5) Make predictnoe2.py executable and invoke it with the command: predictnoe2.py --inxpk noed.xpk --outxpk noex.xpk --increment 1 --detectatom H1 --fromatom 15N2 --relayatom N15 NOTE: This mode sets up an i->i+1 and i->i-1 prediction where the directly detected proton is attached to the N15 nitrogen and the 15N2 atom is attacted to the proton from which the coherence originates. (6) Inspect the output in the --outxpk file (noex.xpk if following step 4 literally) and/or load it into nmrview and overlay it on a spectrum. CAVEAT: Be careful in cutting and pasting the code, especially if you are unfamiliar with the xpk data format. It could easily be scrambled if it acquires false line feeds. -------------BEGIN PROGRAM: predicnoe2.py------------------ #! /usr/bin/env python # predictnoe.py: A python script that predicts neighbor NOE locations # Generates a peaklist of predicted i->i+n and i->i-n NOE crosspeaks from # peaklist of self peaks. # **Input arguments:** # --inxpk input peaklist # --outxpk output peaklist # --fromatom atom in xpk head corresponding to i of i->i+n noe # --relayatom label for the relay atom # --detectatom label for detected atom # --increment n, where noes are between i and i+n # Example input: # predictnoe2.py --inxpk dpeak.xpk --outxpk xpeak.xpk # --increment 1 --detectatom H1 --fromatom 15N2 --relayatom N15 # # ******************* # Known input file assumptions: Header should be six lines long and contain # the list of data labels as the last line # # ****TO DO LIST**** # Add automatic prediction of forward and reverse noes # Test the endpoint predictions (are all possible crosspeaks predicted?) # ***** LOAD MODULES ***** import types import getopt import string import sys sys.path=[sys.path,"/usr/people/robert/bin/python/lib"] import xpktools # ***** FUNCTION DEFINITIONS ***** def get_res(infn,match,dict,headerlen): n=0 i=0 line=infile.readline() while (line): # Read past the header i=i+1 if (i>=headerlen): res=string.split(string.split(line)[1],".")[0] if (res==match): n=n+1 key=res+" "+str(n) dict[key]=line line=infile.readline() def find_label_cols(labels, labelswanted): # Find the column number for to, from and relay atoms # using the xpk header and user inputs fromlabel and tolabel # Input -- the label line from the xpk # (like this for ex. H1.L H1.P H1.W...more stuff...int stat) # Return values: col number for fromlabel and tolabel #** LOCAL INITS ** datamap={} fromlabel =labelswanted[0] relaylabel =labelswanted[1] detectlabel =labelswanted[2] labellist=string.splitfields(string.split(labels,"\012")[0]) # Make a data map of the label and ppm values of atoms of interest for i in range(len(labellist)): if (fromlabel+".P"==labellist[i]): datamap["fromppm"]=i if (detectlabel+".P"==labellist[i]): datamap["detectppm"]=i if (relaylabel+".P"==labellist[i]): datamap["relayppm"]=i if (fromlabel+".L"==labellist[i]): datamap["fromassign"]=i if (detectlabel+".L"==labellist[i]): datamap["detectassign"]=i if (relaylabel+".L"==labellist[i]): datamap["relayassign"]=i return datamap def get_ave_cs(list,col): # Get the average of the chemical shift sum=0 n=0 for element in list: sum=sum+string.atof(string.split(element)[col+1]) n=n+1 return sum/n def read_xpk(infile,dict,headerlen): # Read xpk files into a dictionary of lists # The dictionary entries are indexed by the "to" atom of the noe # (i.e. the detected amide proton for a nh-nh noe experiment) # # Each list contains lines from xpk file with a common first # dimension residue assinment. # The peaklist header is also returned but not included in the dict # # Special dictionary elements: # "maxres" the maximum residue number # "minres" the minimum residue number header=[] # This will hold the header lines i=0 # line counter maxres=-1 # maximum residue number minres=-1 # minimum residue number line=infile.readline() while (line): # Read past the header if (i=headerlen+1): res=string.split(string.split(line)[1],".")[0] # Check min and max and update values as necessary [maxres,minres]=update_min_max(res,minres,maxres) if dict.has_key(str(res)): # Append the additional data about this residue # to a list templst=dict[str(res)] templst.append(line) dict[str(res)]=templst else: # This is a new residue, start a new list dict[str(res)]=[line] # Use [] for list type line=infile.readline() # Add the max and min statistics to the dictionary dict["maxres"]=maxres dict["minres"]=minres return header def predict_xpeaks(dict,fromres,tores,datamap,count): # Predict the position of the fromres->tores NOE crosspeak # Nomenclature: # "fromres" --> "tores" == "i --> i+inc" # *LOCAL INITS* predict=[] fromreslist=dict[str(fromres)] # Residue 1 data line toreslist=dict[str(tores)] # Residue 2 data line #** Get averages of ppm values for coordinates of each residue** # Only the "relay" and "detect" atom data need to be calculated # since the from characteristics will be left in place avefromppm = get_ave_cs(fromreslist,datamap["fromppm"]) #** Change the hn and n assignments and chem shifts to other res** # Base the new line on the "to" residue data line, substituting the from data # Also change the first element (line count) for line in toreslist: fromlabel = str(fromres) + ".n" # ABSTRACT THIS line = xpktools.replace_entry(line,datamap["fromppm"]+2,avefromppm) line = xpktools.replace_entry(line,datamap["fromassign"]+2,fromlabel) line = xpktools.replace_entry(line,1,count) predict.append(line) return predict def parse_args(): opts=getopt.getopt(sys.argv[1:],'',['inxpk=','outxpk=','fromatom=','detecta tom=','increment=','relayatom=']) for elem in opts[0]: if (elem[0]=="--inxpk"): inxpk=elem[1] if (elem[0]=="--outxpk"): outxpk=elem[1] if (elem[0]=="--fromatom"): fromatom=elem[1] if (elem[0]=="--detectatom"): detectatom=elem[1] if (elem[0]=="--increment"): increment=elem[1] if (elem[0]=="--relayatom"): relayatom=elem[1] if (inxpk=='' or outxpk=='' or increment=='' or detectatom=='' or fromatom=='' or relayatom==''): input_args_needed_error() exit(0) return string.atoi(increment), inxpk, outxpk, detectatom, relayatom, fromatom def input_args_needed_error(): print "These input arguments needed for program execution:" print "--inxpk" print "--outxpk" print "--increment" print "--detectatom" print "--fromatom" print "--atom" def input_args_warning(progname): print progname, "Error -- please check your input arguments." print progname, "Quitting." def cols_not_found_warning(progname): print progname, "Error -- One or more data columns not found." print progname, " Try checking your from, to and relay atoms." print progname, "Quitting." def update_min_max(res,minres,maxres): # This function takes care of updating the values of maxres and # minres so that they reflect the global max and min residue # values in the peaklist res=string.atoi(res) if (res>0): if (minres<0): # takes care of initialization where minres=-1 minres=res if (maxres<0): # takes care of initial value maxres=-1 maxres=res if (minres>res): # found a smaller min, replace minres=res if (maxresi+inc and i->i-inc noe positions if possible # Write each one to the output file as they are calculated count=0 # A counter for number the output data lines res=MINRES # minimum should be the lowest i value while (res<=MAXRES): if ( dict.has_key(str(res)) and dict.has_key(str(res+inc)) ): xpktools.write_list(outfile,predict_xpeaks(dict,res,res+inc,datamap,count)) if ( dict.has_key(str(res)) and dict.has_key(str(res-inc)) ): xpktools.write_list(outfile,predict_xpeaks(dict,res,res-inc,datamap,count)) count=count+1 res=res+1 outfile.close() ------------- END PROGRAM: predictnoe2.py -------------- ------------- BEGIN MODULE: xpktools.py ----------------- # xpktools.py: A python module containing function definitions and classes # useful for manipulating data from nmrview .xpk peaklist files. # # ********** INDEX of functions and classes ********** # # xpkentry: Handles xpk data one line at a time generating # attributes that are often sought after such as # the chemical shifts and assignments of the 1st # three dimesions. # Using this function to define a variable # requires both the data line from the xpk file # itself as well as the label line (seventh line) # of the xpk header file. To get that line use # the function get_header_line(infile) in this # module. import string class xpkentry: # Usage: xpkentry(xpkentry,xpkheadline) where xpkentry is the line # from an nmrview .xpk file and xpkheadline is the line from # the header file that gives the names of the entries # which is typcially the sixth line of the header (counting fm 1) # Variables are accessed by either their name in the header line as in # self.field["H1.P] will return the H1.P entry for example. # self.field["linenum"] returns the line number (1st field of line) # self.d1l=first dimension atom label # self.d1p=first dimension chemical shift # -------- same for dim up to 4 ------- def __init__(self,entry,headline): self.field={} # Holds all fields from input line in a dictionary # keyed to the label line in the xpk file datlist = string.split(entry) headlist = string.split(headline) # Parse the entry into a field dictionary self.field["linenum"]=datlist[0] i=1 while i=9): self.d2l=datlist[7] self.d2p=datlist[8] if (len(datlist)>=15): self.d3l=datlist[13] self.d3p=datlist[14] # Assign the general peak properties to special variables self.stat = datlist[len(datlist)-1] self.int = datlist[len(datlist)-2] self.vol = datlist[len(datlist)-3] def get_header_line(infile): i=1 while (i<7): line=infile.readline() i=i+1 return line def replace_entry(line,fieldn,newentry): # Replace an entry in a string by the field number # No padding is implemented currently. Spacing will change if # the original field entry and the new field entry are of # different lengths. # This method depends on xpktools.find_start_entry start=find_start_entry(line,fieldn) leng=len(string.splitfields(line[start:])[0]) newline=line[:start]+str(newentry)+line[(start+leng):] return newline def write_list(outfile,list): for line in list: outfile.write(line) def find_start_entry(line,n): # find the starting point character for the n'th entry in # a space delimited line. n is counted starting with 1 # The n=1 field by definition begins at the first character infield=0 # A flag that indicates that the counter is in a field if (n==1): return 0 # Special case # Count the number of fields by counting spaces c=1 leng=len(line) # Initialize variables according to whether the first character # is a space or a character if (line[0]==" "): infield=0 field=0 else: infield=1 field=1 while (c References: Message-ID: <20031217182932.GD53012@evostick.agtec.uga.edu> Hi Robert; > I would like to contribute some tools that I have developed to the > biopython project. Great! We definitely always welcome contributions. > Among them are programs for analyzing NMR data as well > as modules suited for more general problems of handling resonance > assignment data. I will admit straightaway that I know next to nothing about structural data, so I won't be able to make any comments about the actual work the code is doing (Heh. Predict NOEs -- I don't even know what an NOE that I got predicted is. Heh.). But I can make comments at least from the style and usability aspects and others who know about structural data can help me out. The first major point is that most things in Biopython are organized as modules that can be called from other functions. From looking at your code (and not really understanding structural things) it looks like the two major things the code does are deal with the xpk files and then do the NOE prediction. If I am on target, then it might be best to organize your code as a couple of modules under Bio.NMR, called something like xpktools.py and NOEPredict.py or something similar. This way the predictnoe.py script can call the useful functions from these modules, and they can also be reused from other people's scripts. If structural people have other ideas about where this functionality should be located, please chime in. The second point is that I noticed just on rapid examination that some class and function names don't conform to the Biopython style guide that we use. Specifically, classes are normally named in AllFirstLetterUppercase style, and internal functions (those that aren't meant to be called from other scripts using the modules) are differentiated with _underscores_in_front. Jeff wrote up a nice guide about contributing to Biopython which has these points and additional info: http://www.biopython.org/docs/developer/contrib.html But yeah, after all that -- we definitely would like to have your code as it doesn't (to my knowledge) duplicate anything we already have in Biopython. To sum up my major suggestions would be to: 1. Read over the contribution and style guide for the code. 2. Organize the functionality as modules and make it clear by underscores or some other method which functions are meant to be called by other modules. 3. Have the example script use the modules as an example. 4. Make sure you are willing to put your code under the Biopython license. Thanks for your mail and the code! Brad From Richard.Christen at unice.fr Fri Dec 19 12:19:59 2003 From: Richard.Christen at unice.fr (christen) Date: Sat Mar 5 14:43:29 2005 Subject: [Biopython-dev] Blast Parser error Message-ID: <033e01c3c654$549354c0$2b113b86@christen2002> Hi there I got a problem with the Blast parser ############################## The biology thing : I have been using the blast parser in some kind of a loop to blast n sequences against themselves, and then parse the output to build a distance matrix. I use a sliding window to extract different parts of the n sequences and, how stupid not to check it but I did not exepted, one sequence was much shorter, so I sent to blast a sequence of zero length :-( this is confirmed by the log of formatdb, blast thus provides only a warning (note that formatdb does not return the proper lcl|id of the sequence ! (I will send a mail to ncbi about that) ========================[ Dec 19, 2003 4:42 PM ]======================== Version 2.2.2 [Dec-14-2001] Started database file "D:\Bases\Bac16S\BLAST\Chapon_4133-P" WARNING: [000.000] lcl|50 has zero-length sequence Formatted 90 sequences As a result a got an error in the parser. ############################## Error messages: Traceback (most recent call last): File "test.py", line 24, in ? b_record = b_iter.next() #recherche de la query suivante File "C:\Python23\lib\site-packages\Bio\Blast\NCBIStandalone.py", line 1331, in next return self._parser.parse(File.StringHandle(data)) File "C:\Python23\lib\site-packages\Bio\Blast\NCBIStandalone.py", line 556, in parse self._scanner.feed(handle, self._consumer) File "C:\Python23\lib\site-packages\Bio\Blast\NCBIStandalone.py", line 98, in feed self._scan_database_report(uhandle, consumer) File "C:\Python23\lib\site-packages\Bio\Blast\NCBIStandalone.py", line 422, in _scan_database_report line = safe_readline(uhandle) File "C:\Python23\lib\site-packages\Bio\ParserSupport.py", line 411, in safe_readline raise SyntaxError, "Unexpected end of stream." ############################## test.py sample ... b_parser=NCBIStandalone.BlastParser() # appel du parser b_iter=NCBIStandalone.Iterator(blast_out, b_parser) #appel de l'iterateur ... 23 while 1: 24 b_record = b_iter.next() #recherche de la query suivante 25 26 if b_record is None: 27 break #"plus de reponse Query= a lire... ############################## blast output, section concerned BLASTN 2.2.2 [Dec-14-2001] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= lcl|97633|sp=CYB 296 (0 letters) Database: D:\Bases\Bac16S\BLAST\Chapon_4133-P 90 sequences; 14,728 total letters ***** No hits found ****** Database: D:\Bases\Bac16S\BLAST\Chapon_4133-P Posted date: Dec 19, 2003 4:42 PM Number of letters in database: 14,728 Number of sequences in database: 90 BLASTN 2.2.2 [Dec-14-2001] ############################## usefull pieces of code def safe_readline(handle): """safe_readline(handle) -> line Read a line from an UndoHandle and return it. If there are no more lines to read, I will raise a SyntaxError. """ line = handle.readline() if not line: raise SyntaxError, "Unexpected end of stream." #File "C:\Python23\lib\site-packages\Bio\ParserSupport.py", line 411, in safe_readline return line consumer.start_database_report() while 1: read_and_call(uhandle, consumer.database, start=' Database') # Database can span multiple lines. read_and_call_until(uhandle, consumer.database, start=' Posted') read_and_call(uhandle, consumer.posted_date, start=' Posted') read_and_call(uhandle, consumer.num_letters_in_database, start=' Number of letters') read_and_call(uhandle, consumer.num_sequences_in_database, start=' Number of sequences') read_and_call(uhandle, consumer.noevent, start=' ') line = safe_readline(uhandle) #### NCBIStandalone.py", line 422, in _scan_database_report uhandle.saveline(line) if line.find('Lambda') != -1: break def feed(self, handle, consumer): """S.feed(handle, consumer) Feed in a BLAST report for scanning. handle is a file-like object that contains the BLAST report. consumer is a Consumer object that will receive events as the report is scanned. """ if isinstance(handle, File.UndoHandle): uhandle = handle else: uhandle = File.UndoHandle(handle) # Try to fast-forward to the beginning of the blast report. read_and_call_until(uhandle, consumer.noevent, contains='BLAST') # Now scan the BLAST report. self._scan_header(uhandle, consumer) self._scan_rounds(uhandle, consumer) self._scan_database_report(uhandle, consumer) #######CBIStandalone.py", line 98, in feed self._scan_parameters(uhandle, consumer) ####################################### Thanks in advance Richard CHRISTEN Champion de saut en epaisseur UMR6543 CNRS - Universit? de Nice Sophia Antipolis Centre de Biochimie Parc Valrose 06108 Nice cedex2 tel 33 - 492 076 947 fax 33 - 492 076 408 From benita at cshl.edu Tue Dec 23 09:44:40 2003 From: benita at cshl.edu (Yair Benita) Date: Sat Mar 5 14:43:29 2005 Subject: [Biopython-dev] Interpro parser Message-ID: Hi all, I found some code to work with the online version of Interpro in biopython. However, I couldn't find any code to parse the results. I run Interpro locally and am able to produce xml output. Did anyone make a parser for such an output? I attach an example of the Interpro output. Its clear this should be easy to parse, but I am not sure what would be best in that case. Any suggestions? Yair -- Yair Benita Pharmaceutical Proteomics Utrecht University -------------- next part -------------- Skipped content of type multipart/appledouble