From dyoo at hkn.eecs.berkeley.edu Mon Jan 5 14:19:41 2004 From: dyoo at hkn.eecs.berkeley.edu (Danny Yoo) Date: Sat Mar 5 14:43:29 2005 Subject: [Biopython-dev] Spam: How about restricting posting privileges to only subscribers? Message-ID: Hi everyone, The majority of spam that I'm receiving nowdays is coming from the biopython-development discussion list. It's really disheartening to try reading the archives, and come up with nothing but spam. I know that Mailman has a feature to restrict posting privileges, to keep spam from flooding the list --- would it be possible to turn that on? From chapmanb at uga.edu Mon Jan 5 17:34:01 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:30 2005 Subject: [Biopython-dev] Blast Parser error In-Reply-To: <033e01c3c654$549354c0$2b113b86@christen2002> References: <033e01c3c654$549354c0$2b113b86@christen2002> Message-ID: <20040105223401.GC9588@evostick.agtec.uga.edu> Hello; Sorry about the delay in responding -- I haven't been touching a computer for the past couple of weeks; very good for my mental health. > I got a problem with the Blast parser [...] > I use a sliding window to extract different parts of the n sequences > and, how stupid not to check it but I did not exepted, one sequence was much > shorter, so I sent to blast a sequence of zero length :-( I ran into a similar problem a while back in my own work in which I was BLASTing some bad quality sequences which were completed screened out -- the results of doing this look very similar to what you've included below. The solution I devised at the time was the BlastErrorParser, which you use exactly like the BlastParser. So instead of: b_parser=NCBIStandalone.BlastParser() # appel du parser you would do: b_parser = NCBIStandalone.BlastErrorParser() The error parser tries to diagnose errors and if it sees a recognizable error will raise NCBIStandalone.LowQualityBlastError, which at least lets you know that you've reached broken BLAST output and can toss it away safely. So you're loop would look like: while 1: try: b_record = b_iter.next() except NCBIStandalone.LowQualityBlastError: # deal with getting this error however you like [...] > blast output, section concerned > BLASTN 2.2.2 [Dec-14-2001] > > > Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), > "Gapped BLAST and PSI-BLAST: a new generation of protein database search > programs", Nucleic Acids Res. 25:3389-3402. > > Query= lcl|97633|sp=CYB 296 > (0 letters) > > Database: D:\Bases\Bac16S\BLAST\Chapon_4133-P > 90 sequences; 14,728 total letters > > > > ***** No hits found ****** > > Database: D:\Bases\Bac16S\BLAST\Chapon_4133-P > Posted date: Dec 19, 2003 4:42 PM > Number of letters in database: 14,728 > Number of sequences in database: 90 However, saying all that I'm not sure if the error parser will pick up your problem, and I wonder if there isn't a line missing between the Database and 'No hits found' lines. Normally BLAST should have a line starting with 'Searching' here -- in the case of my bad sequences you'd see: Searchingdone instead of the normal: Searching.............done If this is not the case perhaps we can find some other way to diagnose these files as having the error. If you'd attach the BLAST output from just this sequence we could work on that. Hopefully this helps! Brad From chapmanb at uga.edu Mon Jan 5 17:53:25 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:30 2005 Subject: [Biopython-dev] Interpro parser In-Reply-To: References: Message-ID: <20040105225325.GA65516@evostick.agtec.uga.edu> Hi Yair; > I found some code to work with the online version of Interpro in biopython. > However, I couldn't find any code to parse the results. I run Interpro > locally and am able to produce xml output. Did anyone make a parser for such > an output? I don't believe so. From what I've seen of InterPro in Biopython, Kate wrote some code in Bio/WWW/InterPro.py to retrieve entries and then some code in Bio/InterPro to deal with parsing the web output into a Record object. I've never used the code myself but have not seen Kate around in a while so you might feel free to take a look at it and see if any of the objects (perhaps the InterPro Record object) are useful to you. > I attach an example of the Interpro output. Its clear this should be easy to > parse, but I am not sure what would be best in that case. The best thing to do would be to fire up a straight XML parser included in the Python distribution (I've found the SAX parser to work just fine) and parse it into a standard Record object. We'd be very happy to have a full treatment of InterPro in Biopython so if you want to have a go at this, feel free. Perhaps Kate will chime in if she has an interest in extending her code to include XML output and local interpro. Thanks for looking at this! Brad From chapmanb at uga.edu Mon Jan 5 18:13:37 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:30 2005 Subject: [Biopython-dev] Spam: How about restricting posting privileges to only subscribers? In-Reply-To: References: Message-ID: <20040105231337.GB65516@evostick.agtec.uga.edu> Hi Danny; > The majority of spam that I'm receiving nowdays is coming from the > biopython-development discussion list. It's really disheartening to try > reading the archives, and come up with nothing but spam. Yeah, I've been questioning to myself in my head about what exactly is going on -- it seems like biopython-dev is letting way more spam through then the other open-bio lists. I'm not exactly sure why this is, but I'm not really the best person to comment on this as I know nothing about the mailing list setup -- Jeff or Chris, do you guys have any clue why biopython-dev is seeing so much more spam? > I know that Mailman has a feature to restrict posting privileges, to keep > spam from flooding the list --- would it be possible to turn that on? Yeah, right now the list is pretty bad since it is almost all noise to signal -- if we can't fix the problem another way we might have to do that. Thanks for bringing it up and making us actually do something about it. Brad From jeffrey.chang at duke.edu Mon Jan 5 18:37:36 2004 From: jeffrey.chang at duke.edu (Jeffrey Chang) Date: Sat Mar 5 14:43:30 2005 Subject: [Biopython-dev] Spam: How about restricting posting privileges to only subscribers? In-Reply-To: <20040105231337.GB65516@evostick.agtec.uga.edu> References: <20040105231337.GB65516@evostick.agtec.uga.edu> Message-ID: <2496165E-3FD8-11D8-9372-000A956845CE@duke.edu> On Jan 5, 2004, at 6:13 PM, Brad Chapman wrote: > Hi Danny; > >> The majority of spam that I'm receiving nowdays is coming from the >> biopython-development discussion list. It's really disheartening to >> try >> reading the archives, and come up with nothing but spam. > > Yeah, I've been questioning to myself in my head about what exactly > is going on -- it seems like biopython-dev is letting way more spam > through then the other open-bio lists. I'm not exactly sure why this > is, but I'm not really the best person to comment on this as I know > nothing about the mailing list setup -- Jeff or Chris, do you guys > have any clue why biopython-dev is seeing so much more spam? I've been wondering this myself. I don't know anything about the mailing list setup, so am not much help either. However, the spam filter seems to be active, as it's still tagging some mail as spam. It's odd that so much crud is still getting through, though. Jeff From jpaint at u.washington.edu Mon Jan 5 19:11:55 2004 From: jpaint at u.washington.edu (Jay Painter) Date: Sat Mar 5 14:43:30 2005 Subject: [Biopython-dev] Spam: How about restricting posting privileges to only subscribers? In-Reply-To: <2496165E-3FD8-11D8-9372-000A956845CE@duke.edu> References: <20040105231337.GB65516@evostick.agtec.uga.edu> <2496165E-3FD8-11D8-9372-000A956845CE@duke.edu> Message-ID: <1073347915.3817.3.camel@localhost.localdomain> How about restricting posting to the subscriber list? Spam filters are pretty useless. From matthewh at vetsci.usyd.edu.au Tue Jan 6 16:42:17 2004 From: matthewh at vetsci.usyd.edu.au (Matthew Hobbs) Date: Sat Mar 5 14:43:30 2005 Subject: [Biopython-dev] spam In-Reply-To: <200401062119.i06LJhNS006041@portal.open-bio.org> References: <200401062119.i06LJhNS006041@portal.open-bio.org> Message-ID: <200401070842.17651.matthewh@vetsci.usyd.edu.au> Hi , Yes please do something like this! I have been considering withdrawing from the mailing list because of the spam and this would be a pity. Matthew On Wed, 7 Jan 2004 08:19 am, biopython-dev-request@portal.open-bio.org wrote: > Message: 10 > Date: 05 Jan 2004 16:11:55 -0800 > From: Jay Painter > Subject: Re: [Biopython-dev] Spam: How about restricting posting > privileges to only subscribers? > To: Jeffrey Chang > Cc: biopython-dev@biopython.org, Brad Chapman > Message-ID: <1073347915.3817.3.camel@localhost.localdomain> > Content-Type: text/plain > > How about restricting posting to the subscriber list? Spam filters are > pretty useless. -- Matthew Hobbs Research Fellow in Bioinformatics Faculty of Veterinary Science Building B19, University of Sydney, NSW 2006, Australia Phone: +61 2 9351 3154 Fax: +61 2 93513957 From jeffrey.chang at duke.edu Wed Jan 7 10:50:18 2004 From: jeffrey.chang at duke.edu (Jeffrey Chang) Date: Sat Mar 5 14:43:30 2005 Subject: [Biopython-dev] Spam: How about restricting posting privileges to only subscribers? In-Reply-To: <20040105231337.GB65516@evostick.agtec.uga.edu> References: <20040105231337.GB65516@evostick.agtec.uga.edu> Message-ID: <313DAB95-4129-11D8-AB3D-000A956845CE@duke.edu> It turns out that biopython-dev was configured differently than the other lists (e.g. biopython), and some of the MIME spams were getting through. Chris Dagdigian has updated the filters now, so hopefully we'll see a reduction in the number of spams that get through the list. Many thanks for looking into this, Chris! Jeff From rgb2003 at med.cornell.edu Fri Jan 9 15:15:32 2004 From: rgb2003 at med.cornell.edu (Robert G. Bussell) Date: Sat Mar 5 14:43:30 2005 Subject: [Biopython-dev] Contribution -- NMR xpk files Message-ID: Dear Biopython users and developers, I have contributed some code to biopython for working with NMR data to be included in the CVS, probably in the NMR package. Along with the two modules (xpktools.py and NOEtools.py) is an example script (simplepredict.py) and an input file (noed.xpk). I think you will find the example script to be well documented and readable. Selected Features -- data classes to handle NMR data directly from the .xpk peaklist file -- provides easy, intuitive access to all data in the peaklist file -- a one line method that generates a data table from a peaklist file -- generates a predicted crosspeak .xpk file from diagonal assignments This new functionality will enable biopython users to perform analysis and data extraction of NMR data whether in the form of data tables or directly from .xpk peaklist files. If you give this code a try please let me know especially if you have feedback, suggestions or questions. Looking forward to hearing form you. Bob rgb2003@med.cornell.edu From hoffman at ebi.ac.uk Thu Jan 15 09:46:01 2004 From: hoffman at ebi.ac.uk (Michael Hoffman) Date: Sat Mar 5 14:43:30 2005 Subject: [Biopython-dev] test_Crystal and test_Pathway Message-ID: I noticed that the unittests for these modules were not actually getting run by run_tests.py because they did not have a run_tests() function. I fixed this and generated new test output, and checked in the four files. I figured this would be okay since these files do not even go into the built distribution and the changes have little chance of messing anything up unless the tests no longer work for you. They do work fine for me. If this is not okay I can back these down... -- Michael Hoffman European Bioinformatics Institute From hoffman at ebi.ac.uk Thu Jan 15 10:01:51 2004 From: hoffman at ebi.ac.uk (Michael Hoffman) Date: Sat Mar 5 14:43:30 2005 Subject: [Biopython-dev] Bio.PDB.PDBList SyntaxError Message-ID: $ sudo python setup.py install running install running build running build_py running build_ext running install_lib byte-compiling /usr/lib/python2.3/site-packages/Bio/PDB/PDBList.py to PDBList.pyc File "/usr/lib/python2.3/site-packages/Bio/PDB/PDBList.py", line 174 os.system("%s %s" % (uncompress, filename)) ^ SyntaxError: invalid syntax running install_data There are too many tabs on this line. I can fix if you want. -- Michael Hoffman European Bioinformatics Institute From hoffman at ebi.ac.uk Thu Jan 15 10:10:25 2004 From: hoffman at ebi.ac.uk (Michael Hoffman) Date: Sat Mar 5 14:43:30 2005 Subject: [Biopython-dev] Re: Bio.PDB.PDBList SyntaxError In-Reply-To: <200401151540.10244.thamelry@vub.ac.be> References: <200401151540.10244.thamelry@vub.ac.be> Message-ID: On Thu, 15 Jan 2004, Thomas Hamelryck wrote: > On Thursday 15 January 2004 16:01, Michael Hoffman wrote: > > There are too many tabs on this line. I can fix if you want. > > OK - thanks! Done. -- Michael Hoffman European Bioinformatics Institute From hoffman at ebi.ac.uk Thu Jan 15 10:33:01 2004 From: hoffman at ebi.ac.uk (Michael Hoffman) Date: Sat Mar 5 14:43:30 2005 Subject: [Biopython-dev] Bio.Wise checked in Message-ID: I have checked in Bio.Wise, which contains modules for running and processing the output of some of the models in the Wise package available from : Bio.Wise.psw for protein Smith-Waterman alignments Bio.Wise.dnal for Smith-Waterman DNA alignments There are also appropriate unit tests which will not be checked if dnal is not in your path. Please let me know if you have any problems or suggestions. -- Michael Hoffman European Bioinformatics Institute From pwilkinson00 at hotmail.com Thu Jan 15 17:09:49 2004 From: pwilkinson00 at hotmail.com (Peter Wilkinson) Date: Sat Mar 5 14:43:30 2005 Subject: [Biopython-dev] RE: SPAM Message-ID: Matthew Hobbs, I normally administrate another list, but after I got this digest (the the previous few), I have got tired of the spam taking up space in my mail as well. I have set the list so that you must be a member to post. That will cut the spam down to almost nothing. This is no way will prevent new members from joining. Peter W. +----------------------------------+ Peter Wilkinson Bioinformatics Consultant Lady Davis Institute +----------------------------------+ _________________________________________________________________ Find high-speed ‘net deals — comparison-shop your local providers here. https://broadband.msn.com From Wagied.Davids at ebc.uu.se Fri Jan 30 15:48:10 2004 From: Wagied.Davids at ebc.uu.se (Wagied) Date: Sat Mar 5 14:43:30 2005 Subject: [Biopython-dev] hmmpfam parser Message-ID: <401AC30A.E639BD2E@ebc.uu.se> Hi, I have some code which is able to parse hmmer output, as well as code donated by Joanne Adamkewicz from Exilexis. If you guys/gals find it useful, updates and modification will be done! Wagied Davids Dept.of Molecular Evolution Uppsala University Sweden -------------- next part -------------- ########################################################################### # Copyright (c) 1997-2004 Exelixis Pharmaceuticals, Inc. All Rights Reserved. # # Permission is hereby granted, free of charge, to any person obtaining # a copy of this software and associated documentation files (the # "Software"), to deal in the Software without restriction, including # without limitation the rights to use, copy, modify, merge, publish, # distribute, sublicense, and/or sell copies of the Software, and to # permit persons to whom the Software is furnished to do so, subject to # the following conditions: # # The above copyright notice and this permission notice shall be # included in all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL, INDIRECT OR # CONSEQUENTIAL DAMAGES OR ANY CLAIM, DAMAGES OR OTHER LIABILITY WHATSOEVER, # WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION OR # OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE # USE, PERFORMANCE OR OTHER DEALINGS IN THE SOFTWARE. # # # Module Notes : # # hmmpfam related routines # # Original Authors : Joanne Adamkewicz, Darren Platt # ########################################################################### import re # To make valid hyperlinks, append the domain accession number to the end of the url string. accno2url = {'PF': 'http://www.sanger.ac.uk/cgi-bin/Pfam/getacc?', 'SM': 'http://smart.embl-heidelberg.de/smart/do_annotation.pl?BLAST=DUMMY&ACC=', 'CO': 'http://www.ncbi.nlm.nih.gov/cgi-bin/COG/COG_info.plx?', 'TI': 'http://www.tigr.org/tigr-scripts/CMR2/hmm_report.spl?user=access&password=access&acc=' } def parseHmmpfam(outputfile, debug=0): """ Purpose: hmmpfam is one of the algorithms in Sean Eddy's HMMER software. This function will parse hmmpfam output that was generated either WITH or WITHOUT the --acc option If with, the first column of every hit and domain contains the accno (PF00595) If without, the 1st column contains the domain name (pkinase). You can't get both in the same output. Whichever one is present is returned as the first element in each tuple. In the code below, the variable is 'accno' for simplicity. **NOTE THAT NAMES DO NOT NECESSARILY UNIQUELY DETERMINE ACCNOS, if you are running against an HMM library with models from more than one source. i.e., names are unique within Pfam, but not when you merge both Pfam and Smart models. Example: SM00542 and PF05965 both have the name 'FYRC'. Therefore, if both are in your library, you can't determine which model gave the hit unless you use --acc. Also, Pfam itself changes the names of its models occasionally, but accnos are stable. You are STRONGLY RECOMMENDED to get in the habit of running hmmpfam with the --acc option! Arguments: outputfile - (string) - path to text file containing raw hmmpfam output debug - 0 or 1 - whether to print debugging statements to stdout Returns: [A,B] list, where A and B are each tuples of data as follows: A = hit results - 5 element tuple of: accno of hmm - string score - float evalue - float N (number of occurences of hmm domain in the seq) - int description of hmm - string B = domain results - 11 element tuple of: accno of hmm - string x/y - string (e.g. 1/2 = '1 of 2' -- identifies which occurence of the domain) seq-from - int seq-to - int model-from - int model-to - int score - float eval - float modelAlign - seq string of the model in the alignment matchString - consensus string for model<->seq alignment subjectAlign - seq string of sequence in the alignment Note that proteins with no domain hits will return ( [], [] ) Raises: 'ParseError', with error message to indicate the problem """ if debug: print "Now opening file '%s'" % outputfile s = open(outputfile, 'r') trigger="Description.*Score.*E-value.*N" # header for first data chart hits = [] # will be returned domains = [] # temporary data holder domains2 = [] # will be returned stage = 0 # parser has 5 stages, from top to bottom while 1: line = s.readline() if debug: print line print "stage = %s" % stage if not line or line == None: # EOF break if stage == 0: # # Passing over all the header stuff until we get to the first data table # if re.search(trigger,line): s.readline() if debug: print "**Change to stage 1**" stage = 1 elif stage == 1: # # Reading hits from first data table # if line.find('no hits') != -1: # This protein was not hit by any models # return empty lists if debug: print "No hits found!" return (hits, domains) cols = line.split() if len(cols) == 0: # Blank line = end of table if debug: print "**Change to stage 2**" stage = 2 else: accno = cols[0] # if run without --acc option, this is actually model name, not accno desc = ' '.join(cols[1:-3]) # note that model descriptions are truncated to 38 char in raw output score = float(cols[-3]) eval = float(cols[-2]) n = int(cols[-1]) hits.append( (accno, score, eval, n, desc) ) elif stage == 2: # # Waiting for domains # if line.find('Domain') != -1: stage = 3 if debug: print "**Change to stage 3**" s.readline() elif stage == 3: # # Parsing domains # cols = line.split() if len(cols) == 0: stage = 4 if debug: print "**Change to stage 4**" else: accno = cols[0] count = cols[1] # x of y seqFrom = int(cols[2]) seqTo = int(cols[3]) hmmFrom = int(cols[5]) hmmTo = int(cols[6]) score = float(cols[-2]) eval = float(cols[-1]) # Note this 8-element tuple doesn't include the alignment # strings, because we haven't gotten there yet. domains.append( (accno, count, seqFrom, seqTo, hmmFrom, hmmTo, score, eval) ) elif stage == 4: # # Waiting for alignments # if line.startswith('Alignments of top-scoring domains'): stage = 5 if debug: print "**Change to stage 5**" elif stage == 5: # # Parsing alignments - see sample raw output for help understanding this code # # At the start of each new model hit alignment, there is an info line with accno, # score, and e-value of the hit. Then comes the alignment itself in three-line # groups (model, consensus, sequence), with a blank line between each group, until # the end of the domain is reached. # The parser numbers the lines with variable names as follows: # one - the info line, only occurs once per domain # two - model sequence # three - consensus sequence # four - protein sequence # five - blank line # Thus, two three four and five will repeat several times for each 'one' line. # Each time, we will grab and append the contents to the variable of the appropriate name. # # Note that the model sequence begins and ends with a * character, which helps us # find the last group-of-three in our parsing. # # One complication: for some domain outputs, an extra 'RF' line is inserted above # the model sequence ('two') line in each group of three, making it a group of four. # This parser can handle that case; it ignores the line (doesn't return it). # one = line # get the current line if one.strip() == '//': # End-of-output marker found return (hits,domains2); two='' three='' four='' check=s.readline() # check for, and ignore, the extra 'RF' lines as they appear if check.strip() and check.split()[0] == 'RF': check=s.readline() st=check.index('*') begin= check.strip() end=len(begin) if begin.strip() and begin.split()[0] == 'RF': begin= s.readline().strip() if begin.count('*') == 2: # Both the start and the end of the model sequence are on this line, # so the entire thing only consists of one three-line group. two = begin.strip()[3:-3] # model line starts with *-> and ends with <-*, ignore those chars three = s.readline().strip() four = s.readline().split()[2] # the seq line has flanking start and stop integers, we don't want those five = s.readline() if five.strip() != '': raise "ParseError",'Stage 5: expected blank line' else: # This is the more usual case, the alignment is spread out over # >1 three-line group. flag=1 read =1 while (flag==1): two = two + begin.strip() # append current 'two' line temp=s.readline() # this is 'three' if temp == None: # Yikes! - hit end of file parsing, shouldn't # happen. raise "ParseError",'unexpected end of PFam result' temp = temp[st:st+end] # read only the characters from * to the end three += temp # append current 'three' line four += s.readline().split()[2] # see note above about flanking integers five = s.readline() if five.strip() != '': raise "ParseError",'Stage 5: expected blank line' if begin.strip()[-1] == '*': # We hit the end of the model sequence flag=0 if read != 0: begin= s.readline().strip() if begin.strip() and begin.split()[0] == 'RF': # skip the RF line begin= s.readline().strip() if begin.strip()[-1]=='*': read=0 three=three[3:-1] # append the last bit we found two=two[3:-3] # We are done parsing this domain alignment, save the results and go to next domain domains2.append(domains[0]+(two,three,four)) domains = domains[1:] # All done! return (hits, domains2) # 'RF' example: resultid 14693470 -------------- next part -------------- hmmpfam - search one or more sequences against HMM database HMMER 2.2g (August 2001) Copyright (C) 1992-2001 HHMI/Washington University School of Medicine Freely distributed under the GNU General Public License (GPL) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - HMM file: /usr/local/biotools/lib/Pfam Sequence file: query - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query sequence: gi|24649473|ref|NP_651199.2| Accession: [none] Description: [none] Scores for sequence family classification (score includes all domains): Model Description Score E-value N -------- ----------- ----- ------- --- SM00045 Diacylglycerol kinase accessory domai 262.2 7.8e-76 1 PF00609 Diacylglycerol kinase accessory domai 237.0 2.9e-68 1 PF00781 Diacylglycerol kinase catalytic domai 167.8 2e-47 1 SM00046 Diacylglycerol kinase catalytic domai 166.6 4.6e-47 1 SM00109 Protein kinase C conserved region 1 ( 149.8 5.3e-42 3 PF00130 Phorbol esters/diacylglycerol binding 142.6 7.6e-40 3 PF00788 Ras association (RalGDS/AF-6) domain 60.5 4e-15 1 SM00314 Ras association (RalGDS/AF-6) domain 58.4 1.8e-14 2 PF03107 DC1 domain 25.0 0.00019 2 COG1597 Predicted kinase related to diacylgly -73.3 0.0053 1 SM00360 RNA recognition motif 11.0 0.0059 1 PF00076 RNA recognition motif. (a.k.a. RRM, R 11.2 0.015 1 PF00628 PHD-finger -9.1 0.09 1 SM00249 PHD zinc finger 6.2 0.18 1 SM00184 Ring finger -0.8 0.47 1 PF01500 Keratin, high sulfur B2 protein -83.9 1.1 1 PF04928 Poly(A) polymerase central domain -47.5 1.2 1 SM00361 RNA recognition motif -4.9 1.3 1 COG0284 Orotidine-5'-phosphate decarboxylase, -107.0 2.3 1 PF02376 CUT domain -42.8 3.1 1 PF00412 LIM domain -24.0 3.1 1 SM00215 von Willebrand factor (vWF) type C do -22.0 3.9 1 SM00217 Four-disulfide core domains -21.1 4 1 PF04041 Domain of unknown function (DUF377) -168.6 4.9 1 PF04014 SpoVT / AbrB like domain -14.1 5.1 1 PF03768 Attacin, N-terminal region -14.7 5.2 1 PF00219 Insulin-like growth factor binding pr -23.6 5.4 1 PF01021 TYA transposon protein -296.2 5.6 1 PF03154 Atrophin-1 family -731.6 5.8 1 SM00343 zinc finger C2HC, DNA-binding -4.4 6.3 1 PF04236 Tc5 transposase C-terminal domain -23.8 6.3 1 SM00336 B-Box-type zinc finger, protein inter -18.9 6.4 1 PF03208 Prenylated rab acceptor (PRA1) -89.5 6.5 1 PF00503 G-protein alpha subunit -308.3 6.9 1 COG0008 Glutamyl- and glutaminyl-tRNA synthet -325.3 7.1 1 PF03792 PBX domain -116.2 7.7 1 PF03302 Giardia variant-specific surface prot -292.4 8.7 1 PF04395 Poxvirus B22R protein -557.5 8.9 1 PF04396 Protein of unknown function, DUF537 -40.4 9 1 SM00463 Small MutS-related domain -20.0 9.4 1 SM00157 Major prion protein -87.0 9.5 1 PF01391 Collagen triple helix repeat (20 copi -86.6 9.5 1 Parsed for domains: Model Domain seq-f seq-t hmm-f hmm-t score E-value -------- ------- ----- ----- ----- ----- ----- ------- PF00130 1/3 6 55 .. 1 51 [] 74.6 2.3e-19 SM00109 1/3 6 55 .. 1 61 [] 68.3 1.8e-17 PF03107 1/2 17 48 .. 1 44 [] 4.9 0.39 PF01500 1/1 32 183 .. 1 177 [] -83.9 1.1 PF03302 1/1 36 303 .. 1 412 [] -292.4 8.7 PF00412 1/1 36 81 .. 1 62 [] -24.0 3.1 PF00130 2/3 69 116 .. 1 51 [] 13.1 0.00045 SM00109 2/3 69 116 .. 1 61 [] 30.5 4.3e-06 SM00336 1/1 77 106 .. 1 51 [] -18.9 6.4 PF03107 2/2 80 109 .. 1 44 [] 20.1 0.0057 PF00628 1/1 81 119 .. 1 51 [] -9.1 0.09 SM00249 1/1 81 152 .. 1 39 [] 6.2 0.18 SM00184 1/1 82 151 .. 1 23 [] -0.8 0.47 PF04236 1/1 91 140 .. 1 69 [] -23.8 6.3 SM00215 1/1 100 166 .. 1 105 [] -22.0 3.9 PF00130 3/3 135 185 .. 1 51 [] 54.9 1.9e-13 SM00109 3/3 135 185 .. 1 61 [] 51.0 2.9e-12 SM00217 1/1 138 174 .. 1 51 [] -21.1 4 SM00343 1/1 147 162 .. 1 17 [] -4.4 6.3 PF00219 1/1 148 197 .. 1 84 [] -23.6 5.4 PF04928 1/1 183 1018 .. 1 205 [] -47.5 1.2 PF04395 1/1 189 835 .. 1 1361 [] -557.5 8.9 PF03154 1/1 256 1058 .. 1 1046 [] -731.6 5.8 PF01021 1/1 416 830 .. 1 440 [] -296.2 5.6 PF03768 1/1 469 519 .. 1 72 [] -14.7 5.2 SM00157 1/1 678 903 .. 1 221 [] -87.0 9.5 COG0008 1/1 697 1149 .. 1 592 [] -325.3 7.1 PF03792 1/1 706 876 .. 1 209 [] -116.2 7.7 PF01391 1/1 761 818 .. 1 60 [] -86.6 9.5 PF00503 1/1 819 1019 .. 1 362 [] -308.3 6.9 SM00314 1/2 831 922 .. 1 102 [] 9.9 0.011 PF04041 1/1 887 1149 .. 1 363 [] -168.6 4.9 PF00788 1/1 923 1024 .. 1 113 [] 60.5 4e-15 SM00314 2/2 923 1024 .. 1 102 [] 48.5 1.7e-11 PF04396 1/1 937 1001 .. 1 115 [] -40.4 9 PF02376 1/1 969 1022 .. 1 88 [] -42.8 3.1 SM00361 1/1 1031 1103 .. 1 91 [] -4.9 1.3 SM00360 1/1 1031 1103 .. 1 121 [] 11.0 0.0059 PF00076 1/1 1032 1102 .. 1 77 [] 11.2 0.015 PF03208 1/1 1098 1256 .. 1 162 [] -89.5 6.5 COG1597 1/1 1117 1512 .. 1 332 [] -73.3 0.0053 PF00781 1/1 1119 1267 .. 1 154 [] 167.8 2e-47 SM00046 1/1 1119 1267 .. 1 157 [] 166.6 4.6e-47 COG0284 1/1 1333 1511 .. 1 266 [] -107.0 2.3 SM00045 1/1 1334 1489 .. 1 195 [] 262.2 7.8e-76 PF00609 1/1 1334 1489 .. 1 190 [] 237.0 2.9e-68 SM00463 1/1 1346 1416 .. 1 94 [] -20.0 9.4 PF04014 1/1 1447 1487 .. 1 47 [] -14.1 5.1 Alignments of top-scoring domains: PF00130: domain 1 of 3, from 6 to 55: score 74.6, E = 2.3e-19 *->HrFkrttfyksptfCdhCgellwglakQGlkCsnCglnvHkrChekV H F+ +tf ++pt+C hC +llwgl+ QG+ C++C++ +H+rC++ V gi|2464947 6 HSFVKKTF-HKPTYCHHCSDLLWGLIQQGYICEVCNFIIHERCVSSV 51 ptnC<-* +t+C gi|2464947 52 VTPC 55 SM00109: domain 1 of 3, from 6 to 55: score 68.3, E = 1.8e-17 *->Hkfvfrtf.kptfCdvCrksiwgsfkqaaksqglrCseCkvkcHkkC H+fv++tf+kpt+C++C +++wg+ +q g+ C++C++ +H++C gi|2464947 6 HSFVKKTFhKPTYCHHCSDLLWGLIQQ-----GYICEVCNFIIHERC 47 aekvpaqshksglsC<-* ++ v +C gi|2464947 48 VSSVVT-------PC 55 PF03107: domain 1 of 2, from 17 to 48: score 4.9, E = 0.39 *->fsCdvCerkidpgsngffYsCskeegCndeeetsdyfvhdvrCa<-* C C + + +Y C + Cn f+++ rC+ gi|2464947 17 TYCHHCSDLLWG-LIQQGYICEV---CN--------FIIHERCV 48 PF01500: domain 1 of 1, from 32 to 183: score -83.9, E = 1.1 *->qtScCGfptCStlgtrPsCGsscCQPsCCe...SCCQpsCcqpSCCq q+ C C C+ss P C + C + gi|2464947 32 QGYICEV--CNFII-HERCVSSVVTP--CSgiaPCIIKNPVAHCWSE 73 PtcsqtscCqPtcfqs..........sCCrPsCcqTSCCq....PtCcqs Pt +C +c ++++ + C ++ Cq+ P C + gi|2464947 74 PTHHKRKFCT-VCRKRldetpavhclVCEYFAHIE---CQdfavPDCTEN 119 ssCqtgCgigGsiGyGQeGsSGAvScrirWCRPdCrvegtClPpCCvvsC +g v + W R ++t++ C +C gi|2464947 120 ATYVPGKELL------------NVKHQHHW-R-EGNLPSTSKCAYCKKTC 155 taPTCCqpvsaQasCCRPsCqPyCgqsCCRPaCccsvtCtrTccePc<-* ++ C + ++ +Cg C+ + P+ gi|2464947 156 WSSECLTGYRCE----------WCG-MTTHAGCRMY--------LPT 183 PF03302: domain 1 of 1, from 36 to 303: score -292.4, E = 8.7 *->CaeCklGyelsadktkcetsaPPdCkveNCkaCsnekeeNevCeeCn C+ C +++ +c++s v C + +N v C gi|2464947 36 CEVCN--FII---HERCVSSV-----VTPCSGIAPCIIKNPV-AHCW 71 SgfyLtpnTsqCidaCakiGnyYaqTnaqnKkiCkeCtvAnCktCedqGq S p T C + T a C C+ + +C d + gi|2464947 72 S----EP-THHKRKFCTVCRKRLDETPA---VHCLVCEYFAHIECQDFAV 113 cqaCndGfYksGdaCsPChes..cKTCsgGTaSdCTeCltGkaLrYGnDg + + +Y G + ++ + + + S+C C gi|2464947 114 PDCTENATYVPGKELLNVKHQhhWREGNLPSTSKCAYC------------ 151 TKGtC.GegCttgtGaGPaCkTCGLtIDGtsYCSeCateteyPqNGVCtS K tC C tg C+ CG t gi|2464947 152 -KKTCwSSECLTGY----RCEWCGMT------------------------ 172 taaRatatCkdstvanGvCssCanGyl..rmnGGCYeTtKfPGKSVCeea + a C yl++ n G + +P SV + gi|2464947 173 ----THAGCR--------------MYLptECNFGILQPIYLPPHSVSIPR 204 ngggDTCqkeapGYkldsgdLvvCSeGCktCtssTvCttCadGyvkdggs + +vk+ s gi|2464947 205 TEVP--------------------------------IEAIIGVQVKSKTS 222 dv....CtkCDssCeTCTaGatttCktCaTGYYKsgtgcvkCtssesdSn v++ +C+ D sC aG+ + +g + + + gi|2464947 223 LVrdysCPSPDLSCPIPGAGSGSL----------TSLGLKELLELHRQ-R 261 gitGVkgClsCAPP.snnkGSV.lCYLikdss.sGGnSTNKSGLSTGAIA l PP++ + GS++lC ss + G N gi|2464947 262 LEQSKQHFLLSTPPtPTSCGSIsLCHSPTPSSlTVGETSN---------- 301 GIsVAvviVVGGLVGFLCWWFiCRGKA<-* A gi|2464947 302 -------------------------EA 303 PF00412: domain 1 of 1, from 36 to 81: score -24.0, E = 3.1 *->CagCnkpIydrevvrralnkvwHpeCFrCavCgkpLtegdefyekdg C Cn I++r+v+ + +p++ + +k+ gi|2464947 36 CEVCNFIIHERCVS-SVV---------------TPCSGIAPCIIKNP 66 skelYC..khDyyklfg<-* C +++ ++k+++ gi|2464947 67 -V-AHCwsEPTHHKRKF 81 PF00130: domain 2 of 3, from 69 to 116: score 13.1, E = 0.00045 *->HrFkrttfyksptfCdhCgellwglakQGlkCsnCglnvHkrChekV H++ t ++ +fC +C+++l ++ ++C +C + +H +C+ ++ gi|2464947 69 HCWSEPTH-HKRKFCTVCRKRLDETP--AVHCLVCEYFAHIECQDFA 112 ptnC<-* +++C gi|2464947 113 VPDC 116 SM00109: domain 2 of 3, from 69 to 116: score 30.5, E = 4.3e-06 *->Hkfvfrtf.kptfCdvCrksiwgsfkqaaksqglrCseCkvkcHkkC H ++++t++k++fC vCrk++ + ++ C +C + +H C gi|2464947 69 HCWSEPTHhKRKFCTVCRKRLDETP-------AVHCLVCEYFAHIEC 108 aekvpaqshksglsC<-* ++++ + +C gi|2464947 109 QDFAVP-------DC 116 SM00336: domain 1 of 1, from 77 to 106: score -18.9, E = 6.4 *->eraplCeeHgd..eepaeffCveedgallCrdCdeageHqanklfrg +++ C++++++ +e+ + C ++C+ + H gi|2464947 77 HKRKFCTVCRKrlDETPAVHC---------LVCEYF-AH-------- 105 Hrvvll<-* + gi|2464947 106 -----I 106 PF03107: domain 2 of 2, from 80 to 109: score 20.1, E = 0.0057 *->fsCdvCerkidpgsngffYsCskeegCndeeetsdyfvhdvrCa<-* + C vC++++d+ + +C + C+ yf h+ +C gi|2464947 80 KFCTVCRKRLDE---TPAVHCLV---CE-------YFAHI-ECQ 109 PF00628: domain 1 of 1, from 81 to 119: score -9.1, E = 0.09 *->yCsvCgkvdddaggdllqCDgCdrwfHlaClgppleeppegkWlCpe +C vC+k d + + C C+ H++C ++ + gi|2464947 81 FCTVCRKRLD--ETPAVHCLVCEYFAHIECQDFAVP----------D 115 Ctpk<-* Ct+ gi|2464947 116 CTEN 119 SM00249: domain 1 of 1, from 81 to 152: score 6.2, E = 0.18 *->yC.vCgk....g.llqCdkgCdrwyHv.Clgpple............ +C+vC+k+ ++ + + C C+ H++C +++ ++ +++ + +++ gi|2464947 81 FCtVCRKrldeTpAVHCL-VCEYFAHIeCQDFAVPdctenatyvpgk 126 ..............epdg.wyCprCk<-* + + +++++ ++++ +++ +C Ck gi|2464947 127 ellnvkhqhhwregNLPStSKCAYCK 152 SM00184: domain 1 of 1, from 82 to 151: score -0.8, E = 0.47 *->CpICle.......pvvlpCgH.FCr.Ci................... C++C + ++++ + +l C + +C + ++ +++ + ++++ gi|2464947 82 CTVCRKrldetpaVHCLVCEYfAHIeCQdfavpdctenatyvpgkel 128 ...................CPlC<-* + ++++++++++ +++++C C gi|2464947 129 lnvkhqhhwregnlpstskCAYC 151 PF04236: domain 1 of 1, from 91 to 140: score -23.8, E = 6.3 *->dshdeFlTPsqYCfgvsGhvdtviCyftgCqnlaFIrCARCKkfPar +TP+ +C+ ++ + Cq++a C + gi|2464947 91 ------ETPAVHCLV----CE--YFAHIECQDFAVPDCTENATY--V 123 tGknfiCFnHfVvsefhacpcp<-* Gk++ v +h + gi|2464947 124 PGKEL-----LNVKHQHHWREG 140 SM00215: domain 1 of 1, from 100 to 166: score -22.0, E = 3.9 *->CqNAvnnGsyYppLNkGakWdDiALtGRtEDtdDCsnrCtClnGrvs C y++++ +++++ DC+ + t G+ + gi|2464947 100 C-------EYFAHIECQDFAV-----------PDCTENATYVPGKEL 128 lCtkvwCgpkpClLhgslsKSSnlsgeCplgqgcvpslsdqKqYtvHGDC l k + +++ g l+ +++ C+ + +s s+ C gi|2464947 129 LNVKHQHHWRE----GNLP----STSKCAYCKKTCWS-SE---------C 160 fsvltsP.C<-* lt ++C gi|2464947 161 ---LTGYrC 166 PF00130: domain 3 of 3, from 135 to 185: score 54.9, E = 1.9e-13 *->HrFkrttfyksptfCdhCgellwgla.kQGlkCsnCglnvHkrChek H+++ ++ +s+ C++C++ +w +G++C++Cg++ H+ C + gi|2464947 135 HHWREGNL-PSTSKCAYCKKTCWSSEcLTGYRCEWCGMTTHAGCRMY 180 VptnC<-* +pt+C gi|2464947 181 LPTEC 185 SM00109: domain 3 of 3, from 135 to 185: score 51.0, E = 2.9e-12 *->Hkfvfrtf.kptfCdvCrksiwgsfkqaaksqglrCseCkvkcHkkC H++++++ ++ ++C++C+k++w+s + g+rC++C+++ H C gi|2464947 135 HHWREGNLpSTSKCAYCKKTCWSSECLT----GYRCEWCGMTTHAGC 177 aekvpaqshksglsC<-* +p C gi|2464947 178 RMYLPT-------EC 185 SM00217: domain 1 of 1, from 138 to 174: score -21.1, E = 4 *->KpGsCPwvqlpiiasCplgnppnkCssDsqCpGnkKCCengCGKksC ++G P +++ + ++ C+s s+C + C CG gi|2464947 138 REGNLP-----STSK--CAYCKKTCWS-SECLTGYRC--EWCG---- 170 ltPv<-* +t gi|2464947 171 MTTH 174 SM00343: domain 1 of 1, from 147 to 162: score -4.4, E = 6.3 *->kCynCGkeGHiardCpk<-* kC +C+k+ ++++C gi|2464947 147 KCAYCKKTC-WSSECLT 162 PF00219: domain 1 of 1, from 148 to 197: score -23.6, E = 5.4 *->CprPcGGpCpaerlarCpPgPpvaPpaecaelvredGCGCClvCArq C+ C C + ec + r C+ C gi|2464947 148 CA-----YCKKT----CWSS-------ECLTGYR------CEWCGMT 172 eGeaCGvytPrDeskGLyCarGaedaakaLrCrpppG<-* + C y+P +C++G +L+ +p gi|2464947 173 THAGCRMYLPT------ECNFG------ILQPIYLPP 197 PF04928: domain 1 of 1, from 183 to 1018: score -47.5, E = 1.2 RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx *->stkqyGvtkpislagpkekdvkltesLieeLkefgsf.......... ++ ++G+ +pi l+ p++ + +te ie++ +++ ++++ ++ + gi|2464947 183 TECNFGILQPIYLP-PHSVSIPRTEVPIEAIIGVQVKsktslvrdys 228 RF .................................................. ++++ + + ++ ++++ ++ + ++ + ++++ ++++++ +++++++ gi|2464947 229 cpspdlscpipgagsgsltslglkellelhrqrleqskqhfllstpptpt 278 RF .................................................. + ++ + +++++++ + ++++++ ++++++++++++++++++++++++ gi|2464947 279 scgsislchsptpssltvgetsneaeqdrerdqdqpeeepeeenteqdsa 328 RF .................................................. + ++++++ ++ ++ ++ +++ + ++ ++ ++++++++++ ++++ gi|2464947 329 lqlttstsnvignlqkwpsansslhllytnlfrklgqgkrrrkrgissgg 378 RF .................................................. ++++++++ +++ + ++++ +++ ++ + ++++ +++++++ ++++ gi|2464947 379 lspsededdvdggvcdisggdlsddydhcdvalrrrslrsrqprdvsetd 428 RF .................................................. +++ + + ++++ ++++ ++++++++ +++++ +++ + ++ + +++ gi|2464947 429 yhgdaeaeaegetvprescyetsdtggeltntddldsslnlisnlsynss 478 RF .................................................. ++++ + +++ + ++ +++ +++++ ++++++ + ++++++++++ gi|2464947 479 nnsnacnvpggatapdarntattsttapgksghalsvqggrqqpktgala 528 RF .................................................. + ++++++ ++++ ++++++ +++ +++++++ ++ +++ + + + + gi|2464947 529 qikpkpkpilmpkhkaqgkggslssplsnsnssdcssaspsapatllqls 578 RF .................................................. + +++++ +++ + ++ ++ +++ +++++++++++ + +++++ + gi|2464947 579 pvgrsksfqesaaitavsrykkygrglfqrrrskrspknavgvggksnys 628 RF .................................................. ++ +++ + + ++++++ ++ +++ + ++ + ++ +++ + ++ + gi|2464947 629 ldrlsqnieitiqdedgnfhpyddnyhmlagrldatdvdddvgfddlyld 678 RF .................................................. +++++ +++ ++ ++++ +++++ ++ ++++ ++ ++ +++ + + gi|2464947 679 drpsgasddvafagdisdggassrsrasdasdghvlgrllrqvrqglsvg 728 RF .................................................. ++++ ++++ ++ +++ +++++++ +++++ ++ ++++++++++ ++++ gi|2464947 729 wrkpryqkrrarsiseefssgdtprfkdeesaskaesghgpssggagggg 778 RF .................................................. ++++ ++++ + + + ++++++ +++++++++++++++++++++++++ gi|2464947 779 gsggaggssaagasasaaggssghyrpdsgsghksdksekdrekkekere 828 RF xxxxxxxxxxxxxxxxxxxxxxxxxxx .......................eilkLVPnkyevFrltLraiKlWAkrr +++ + + +++++ ++++ + i+++ + +++Lra + + gi|2464947 829 ekdiemikvfdgnnsfrrqqyrvIIVQRTYTLEQLLTTALRAFHITRDPQ 878 RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx giYsNvlGFlgGvaWAiLVAriCQlYPnavpstlvekfFlvfsqWlrhnw + Y l + G + + P + l k ++ + h++ gi|2464947 879 AFYLTDLYAPAGMEDTPMLDPT----PVLNLVHLEGKRPAIYLRF--HDR 922 RF xxx xxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxx pnP...VlLkeinsdsieernlqvrvWdprknk...sDricyhlmPiiTP + +V+ ++++ +e+ +v+v ++ + k+ +D+ ++ + gi|2464947 923 DRGhvrVYPGKLQCSMLEDPYVSVPVDNSTVIKdliRDA--LDKFGLQDN 970 RF xx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Ay.PqqnstynVsestlkvileefkrgleildeielgkaeWskLfeka<- + + +s + + il+ +r +i++++ + ++ ++L+ + gi|2464947 971 QIqDYRCSEVLLDRGVTERILSWNERPWDIMKQLGKDSIRQMELMRFY 1018 RF * gi|2464947 - - PF04395: domain 1 of 1, from 189 to 835: score -557.5, E = 8.9 *->lfsltilaIyiliteSegyetClRKtplYHdtqkkiepKentDhkAs +++ ++l+ +++S + + +p e + gi|2464947 189 ILQPIYLP---PHSVS---------------IPRTEVPIEAI----- 212 AtykYLsiaekkEkerflesFnWtkIkeeVKdaFirkCdlssnkdRLdgv I VK+ + + gi|2464947 213 -------------------------IGVQVKS--------K-------TS 222 ykYNYtiaysltVskksekktkgtdiestykkitknivastlslskvdee + +Y+ s + s + + ++ +s ++k + ++ + l+ sk gi|2464947 223 LVRDYS-CPSPDLSCPIP-GAGSGSLTSLGLKELLELHRQRLEQSK---- 266 yTFttiiyaTvtssleTsSvPiddrSsdyvntiaikiLikvLdVNETele gi|2464947 - -------------------------------------------------- - aylisnESLimAkyinttknkdskvdfnlPkvehitYenskCnNiTvdkV ++P+ + ++s C T + gi|2464947 267 ----------------------QHFLLSTPPTPTSCGSISLCHSPTPSSL 294 tIGnFSvidvdsaenakedIrIiFkGvStsdPYvdSDkfieCitkkInnc t+G +++ ae+ e d D+ +e + ++ gi|2464947 295 TVGE----TSNEAEQDRE---------------RDQDQPEE---EPEEEN 322 knsndvkgkvkveKsvTsNCekCsMgLMaeVtsvPeEFnnTLKenGikdD ++ + + + +v++N L++ + gi|2464947 323 TEQDSALQLTTSTSNVIGN----------------------LQKWPSANS 350 dlteLYNFYlCmltnnddCseYvpLtekikedtlksLssYsliktsrsRr l L Y +L +k ++ + Rr gi|2464947 351 SLHLL----------------YTNLFRK--------------LGQGKRRR 370 KsRPRRnAGDsRDtdeeteiS.sEdLe......CmYlsYdtddDDDredd K R + + +S+sEd ++ +++ C d +dD gi|2464947 371 K----------RGIS-SGGLSpSEDEDdvdggvCDISGGDLSDD------ 403 drydqCvnspekeItaKsRkkRsdseeknekRKqsYKnRPKRsLdddltd yd+C sR +R se + ++ d + gi|2464947 404 --YDHC-DVALRRRSLRSRQPRDVSETDYHG-------------DAEAEA 437 ylKKyLgie.eVIPkkAsHlQVGistsYgkseedgViGDs.....sIysd e+e P++ + ts ++e ++ D+ +++ ++s+ gi|2464947 438 --------EgETVPRESCY-----ETSDTGGELTNT--DDldsslNLISN 472 vKdrAkkllekimPsvPldTdpeslyakirkptkikLPpdsKnivtealr ++ +++ + + +vP ++ + + + ++t gi|2464947 473 LSYNSSN--NSNACNVPGGATAPDARNTATTSTT---------------- 504 siieqKqeSvkevLkteselssssieeaetegkskhkssveteivvLskd +gks h +sv gi|2464947 505 -----------------------------APGKSGHALSVQGG------- 518 DldvkenysrkglvsriddepvyedirsvdrlkekirdyrkkGGkkkess r +++ ++ + +k + k +Gk + gi|2464947 519 ---------RQQPKTG-------ALAQIKPKPKPILMPKHKAQGK----G 548 isvlkevsrtssgmfdvDtStvvvkPsrrkitsasrnfessskpsrrlss s+ ss+++++++S+ + + + t + + sk+ ++ gi|2464947 549 GSL-------SSPLSNSNSSDCSSASPSAPATLLQLSPVGRSKSFQESAA 591 develeyeknyrdSlepekssssrkrCkrglnkAvcaiLgrvplpeknnn + ++ y k+y +++ r+r kr+ + Av +++ n gi|2464947 592 ITAVSRY-KKY------GRGLFQRRRSKRSPKNAV--------GVGGKSN 626 dvvkdaravssvvdskrsSSaslySllPgvdtgeAAaagniardRqanaq ySl+ + +e gi|2464947 627 -----------------------YSLDRLSQNIE---------------- 637 venesitTPltRraaaaRrfqqGRvpdrgetnlvnelqklpls...tsqL + i++ + + ++ + + l ++ + t gi|2464947 638 ---ITIQDEDGNFH-----------------PYDDNYHMLAGRldaTDVD 667 snsvykeavqlstsgdesllqvpqRpsqsvvqgstPvrpsPPlpPardrl ++ + +++ ++++ s+ + +++ + +s ++r s+ gi|2464947 668 DDVGFDDLYLDDRPSGASDDVAFAGDISD-GGASSRSRASDASDG----- 711 rrPlaAiipedsipkskgipkvvsprlRrStsGvvcGMlQSkvksdgtYs v+++ lR G + G + gi|2464947 712 --------------------HVLGRLLRQVRQGLSVGWRKPR-------- 733 LvqlPiDGYPGnPArRPLPRIPiRsDssDssDHiYEtiGsRsRsYAGssG Y +Rs s ssG gi|2464947 734 ----------------------------------YQKRRARSISEEFSSG 749 .tHYnAiegSssdagsiessslesssgipkdkvvvgdrSgtssGGrrsGR +t e S s a s +++ Sg+ +GG++sG gi|2464947 750 dTPRFKDEESASKAESGHGP-----------------SSGGAGGGGGSGG 782 rnsvrseSgySsddsevsmEGSVYqPSiKElnsksskkYkekMkkISsSf ++ + ++ s + + ss+ Y+ + + + gi|2464947 783 AGGSSA-----AGASASA-------------AGGSSGHYRPD-SGSGHKS 813 DKsmaFglAmQligQqaInrqsRseriqkddrdkaEkvFEAVStsLSTiG DKs + dr k Ek gi|2464947 814 DKS-------------------------EKDREKKEK------------- 825 ttmttAGIiaSPhLAfAGMGLSlISGLIDtGKDIYYlfSGkekPeDPlvK gi|2464947 - -------------------------------------------------- - kFNtYrelVsDtskmGVRKClmPGsDltIYlaYRNDSSFkPslEkLaLyF gi|2464947 - -------------------------------------------------- - iDtIdSvLYYLNTSnIIlDysLtVACPIGyLRSPdLDITAYTiLKFtTed e+ gi|2464947 826 ----------------------------------------------EREE 829 nVKFYqFtRLGAMLSKfPvVrLTCGrdiTLT<-* +di gi|2464947 830 -------------------------KDIEMI 835 PF03154: domain 1 of 1, from 256 to 1058: score -731.6, E = 5.8 *->ekhssRtfrargs..astlrsGRkkyPastdGvlSPvnedvrskGrn e h R +++ st+ P s G +S + + gi|2464947 256 ELHRQRLEQSKQHflLSTPPT-----PTSC-GSIS-----LCHSPTP 291 aaSavstssNdsK......aeavkksakkVkeeaASglknTKrqrekVas + v+ sN+ ++++++++ + + e S l T ++ gi|2464947 292 SSLTVGETSNEAEqdrerdQDQPEEEPEEENTEQDSALQLTTSTSNVIG- 340 dtldsDRAaskkakfqevsRPNlPse.gEGEssdlRslNdesaSdPklid +l + + Nl g G R + + S+ + d gi|2464947 341 -NLQK---WPSANSSLHLLYTNLFRKlGQGKRRRKRGISSGGLSPSEDED 386 QdnRslsgslPSPqDnEsDsDyaaqQqMlqlqPgalkaPslAaSAPsslP d + g + s D + + al+ sl P + gi|2464947 387 -D---VDGGVCDI----SGGDLSDDYDHCDV---ALRRRSLRSRQPRDVS 425 PassslPaPGPtrfaysvssssSaAaSsssssssSsvaPaaasLiQalPs a + s +s + + l s gi|2464947 426 ETDYHGDAE--------AEAEGETVPRESCYETSDTGGELTNT--DDLDS 465 lHPhrlPsPhtsLsvstaPPkytsAQPslPsqalhsQGPPgPhslqtGrL l s+ +y+ s gi|2464947 466 S----LN------LISNL--SYN--------------------SS----- 478 LansnahPqPFGLtPq...SsqaQstlgPsPvaaHhHstiQlqasQsalQ nsna P G t ++ + + st P + H s + gi|2464947 479 -NNSNACNVPGGATAPdarNTATTSTTAP-GKSGHALSV--------QGG 518 qQQhhrneqPlPPaalamPLEGGssHHikPyatsPsLGslrqlPagqAHk QQ+ +ala kP + + gi|2464947 519 RQQPKT-------GALAQI---------KPKPKPI--------------L 538 hPPHLSqvSyfsanaNlPPvssalkslSSlStgsyPsaHPsPlQLgPQsa P H + + s+ ++s SS Psa lQ gi|2464947 539 MPKH------KAQGKGGSLSSPLSNSNSSDCSSASPSAPATLLQ------ 576 PlPfsPvqPtvlTsSasLstviatvASsPaGYKTasPPGlhqvgkraPfP sPv + S s i t+ S K + Gl q + +P gi|2464947 577 ---LSPV---GRSKSFQESAAI-TAVSRYK--KYG--RGLFQRRRSKRSP 615 GAyktavPgGykPisPPSFRtGtPPGYRtssPPAGPGtFKPGSssvqPGP k av G k R + G F P gi|2464947 616 ---KNAVGVGGKSNYSLD-RLSQNI---EITIQDEDGNFHPYDDNYHM-- 656 lsaAvsSGlPslPPPPaAPasGpPLsAvQIKeEa.ldEaEePESPvPPaR l A L A+ + + + d gi|2464947 657 --------L--------AGR----LDATDVDDDVgFDDL----------- 675 SPSPePkVVDvPSHASQSARFyKHL.DRGyNSCAR.sDLYFvPLeGSKLA D PS AS F +D G S R sD G L gi|2464947 676 ------YLDDRPSGASDDVAFAGDIsDGGASSRSRaSDASDGHVLGRLLR 719 KKRedlvEKvkREAEQkAREEkEREkEkEkEkEREREkERElERavkkAs R l + Qk R E R E ++ kA gi|2464947 720 QVRQGLSVGWRKPRYQKRRARSISEEFSSGDTPRFKDEE-----SASKAE 764 ssAHEGRAPledPsLsGPvhmRPsFEPgPsavAaVPPYlGPDTPALRTLS G Ps G + s g s A G A S gi|2464947 765 ----SGH----GPSSGGAGGGGGSGGAGGSSAA------GASASAAGGSS 800 EYARPHVMSPtNRNHPFYvPLnavDPGLLaYnvPaLYsvDPaiRERELRE RP D G +D RE +E gi|2464947 801 GHYRP-------------------DSG-------SGHKSDKSEKDREKKE 824 REiREREi............RERdLR....dRlKPGFEVKPsELdPLHgv +E E i+ + +++++ R R R L H gi|2464947 825 KEREEKDIemikvfdgnnsfRRQQYRviivQRTYTLEQLLTTALRAFHIT 874 tnPGldhFaRHsaLalqPGaaGlHPFasFHPs....LnPLERERLALAAG P F L+ +aG + P + +L LE R A gi|2464947 875 RDP--QAFY----LTDLYAPAGMEDTPMLDPTpvlnLVHLEGKRPAIY-- 916 PaLRPdMSYadRLAAERiHAERvAsLtsDPLARLQMlNVTPHHHQHSHIH LR + dR R H V P Q S gi|2464947 917 --LR----FHDR---DRGH-----------------VRVYPGKLQCSML- 939 SHLHLHQQDalHaaSAsPVHPLvDPLaaGsHLaRiPYPaGTLPNPLLgqP D+ S vD + L R L gi|2464947 940 -------EDP--YVSVP-----VDNSTVIKDLIR---------DALDKFG 966 lHEnEvLRHqlFaaPYPRDLPaalsa....PMSAAHQLQAMHAQSAELQR l n + R + +++P QL EL R gi|2464947 967 LQDNQIQDYRCSEVLLDRGVTERILSwnerPWDIMKQLGKDSIRQMELMR 1016 LAlEQQqWLHa.HhhlHsvhLP...aQEDYYSrLKKEsDKqL<-* + q H++ l + LP++ +Q Y L K gi|2464947 1017 FYMQHKQDPHGpNIALFVGNLPtglSQRNYEQILNKYVTDEN 1058 PF01021: domain 1 of 1, from 416 to 830: score -296.2, E = 5.6 *->MESQQLsQnsri.lHGSAyASVTSKEVh..............sNQDP + S Q s + HG A A V++++ ++++++++ +N D gi|2464947 416 LRSRQPRDVSETdYHGDAEAEAEGETVPrescyetsdtggelTNTDD 462 LdVSASkleEfdkdSTKvNSQQeTTPasSAVPENhHHvSPQtAs...... Ld S + S NS P+ P + t ++++++ gi|2464947 463 LDSSLNLISNLSYNSSN-NSNACNVPGGATAPDARNTATTSTTApgksgh 511 vhsPQNG.qYqQqgMMTqNkAnaSnWafYqqPSMityshYQ......tSP s Q G+q + g + q k + +P +++ Q+++++ SP gi|2464947 512 ALSVQGGrQQPKTGALAQIKPKP-------KPILMPKHKAQgkggslSSP 554 a..YyqPdPqyqlPQYissvGtPLSTsSPdsidsftdsSevdsdeTkvkk ++ d + P + L SP + S + + + kk gi|2464947 555 LsnSNSSDCSSASPSAPA----TLLQLSPVGRSKSFQESAAITAVSRYKK 600 yVlPPhtLTSeedFstWVKfYIkFLkNSNLGdIIPtvnGkikRQiTddEl y + L K + SN n +i Q d gi|2464947 601 YGR---GLFQRRRSKRSPKNAVGVGGKSNYSLDRLSQNIEITIQDEDGNF 647 aylYNTFQiFAPfqlLPTWVKdILevdYaDIlkvLsKSveKMQsdtQElk + A +l T V d +v + D + + gi|2464947 648 HPYDDNYHMLA-GRLDATDVDD--DVGFDDLY---------LDDRPSGAS 685 DivaLANLeYdGSTsADaFEikVstIIdRLkeNnInvsdklACQLIlkGL D va A dG s s +v +l Q + +GL gi|2464947 686 DDVAFAGDISDGGASSRSRASDAS---------DGHVLGRLLRQ-VRQGL 725 SGdyKyLRytrrrklNMklaeLFldIqlIYdEnkisrlsKPsyrknhSde S ++ Ry++rr + I +E s P ++ + S gi|2464947 726 SVGWRKPRYQKRRA------------RSISEE--FSSGDTPRFKDEESAS 761 KNvSRsytNTTktKViaRNyQkTNsSKskaAkAHNvaTSskfsrvdNDsI K S + sS A A Ss r d s gi|2464947 762 KAESGHGPSSGGAGGGGGSGGAGGSS-AAGASASAAGGSSGHYRPDSGSG 810 skSTvesiyLsddndLsLrqetk<-* kS + + e k gi|2464947 811 HKSDKSEKDREKKEK---EREEK 830 PF03768: domain 1 of 1, from 469 to 519: score -14.7, E = 5.2 *->leGSltlNsdGgsdArlklkVplvGndknnvsaeVFAlGsvdlndqg l l+ Ns+ s A + Vp G A +d+ gi|2464947 469 LISNLSYNSSNNSNA---CNVP--GG----------AT----APDAR 496 kpvtaGaglAldNvnGHGLSLTkth<-* +t++ + + + GH+LS+++++ gi|2464947 497 NTATTSTT--APGKSGHALSVQGGR 519 SM00157: domain 1 of 1, from 678 to 903: score -87.0, E = 9.5 *->kkrPkPGGGWntGGsRYPGqgsPGGnrYPpqgggGgWGqPhGGgWGq +P G + + G s GG +++ + G+ G q gi|2464947 678 --DDRPSG--ASDDVAFAGDISDGGASSRSRASDASDGHVLGRLLRQ 720 PHgG...gWGqPHgGgWGqPHGgggWgqGGGthnqWnkPsKPKtnlKH.. G + gW +P ++ G t + s K + H++ gi|2464947 721 VRQGlsvGWRKPRYQKRRARSISEEFSSGD-TPRFKDEESASKAESGHgp 769 vAGAAAAGAvvGGLGGYmLGsams..........rPliHFGndyED.RYY G A G GG GG a ++ ++++++ rP G + + gi|2464947 770 SSGGAGGGGGSGGAGGSSAAGASAsaaggssghyRPDSGSGHKSDKsEKD 819 rEnmyRYPnqvyYrPvDqYsnqnnfvHDCvnitvKqHtvttttKGEnFtE rE + + + + n+f + + q t t E gi|2464947 820 REKKEKEREEKDIEMIKVFDGNNSFRRQQYRVIIVQRTYTL----EQLLT 865 tDvKimErvveqmCitqYqkEsqAyyqRgasvvlfssPpv<-* t ++ + Y + A + +l +P+ gi|2464947 866 TALRAFH--ITRDPQAFYLTDLYAPAGMEDTPMLDPTPVL 903 COG0008: domain 1 of 1, from 697 to 1149: score -325.3, E = 7.1 *->alvNAi.h.GKAn.kAVMGkvm.enpelRsma.ea.eiv.nfieqvn ++ ++ ++A+++ V+G +++ ++ +++++ gi|2464947 697 GGASSRsRaSDASdGHVLGRLLrQ----VRQGlSVgWRKpRY----- 734 smslmekk.lle.lype................................. +k+++ ++++e +++++++ +++++ ++ ++++++++++ ++++ gi|2464947 735 ------QKrRARsISEEfssgdtprfkdeesaskaesghgpssggagggg 778 .................................LpelevmgkVrTRFAPS ++++ ++++ + + + ++++++ ++++++++ ++ ++++ + gi|2464947 779 gsggaggssaagasasaaggssghyrpdsgsghKSDKSEKDREK------ 822 PTGyLHIGgARtALfNylfARhygGkFiLRIEDTDpTeRstpea.eeaIl + e++++e ++e+I+ gi|2464947 823 -----------------------------------K-EKEREEKdIEMIK 836 edLkWLGlnWDegpdvGGpYgpyyQSeRfdiYyeyaekLieeGkAYyCyc ++ + ++Q +Y+ + + + gi|2464947 837 V------FDGNNS--------FRRQ-----QYRVIIVQR---------TY 858 tpEELealRGtltregaeapgrdprYdgnlrlltkmeegeypageGeppv t E+L t++r+ + + Y l + me+ ++ pv gi|2464947 859 TLEQLLT---TALRAFHITRDPQAFYLTDLYAPAGMEDTPMLDP---TPV 902 vRfKvplegep.k.lnivfrDlvkGrIvfanad.....ilhDfvilRsDG + + v leg +++ f+D +G ++ ++ + + + ++ v gi|2464947 903 LNL-VHLEG-KrPaIYLRFHDRDRGHVRVYPGKlqcsmLEDPYV------ 944 yPTYnFAVVVDDhlMGITHViRGeDhlsNTprQillyeAlGwpvtwepPv V+VD ++T + +l+ +Al+ gi|2464947 945 ------SVPVD----------------NSTVIKDLIRDALDKFG------ 966 faHlplilneglSKrklkkledgkKLSKRdgpRaptveayRrrGylPEAl + + + + + + +rG++ gi|2464947 967 ------LQD-----------NQIQDYRCSEV--------LLDRGVT---- 987 rNflallGvwspddddqEifsleelirkFdlervskspavfDpkKLewlN ++ s+ + ++ + + +l ++ ++ ++ ++++ gi|2464947 988 ----ERIL--SW-N-ERPWDIMKQLGKD----------SIRQMELMRFYM 1019 aeyikeelddeplhpllkpflphpeaGerelpftrelkkdidyidredle + k + p+ l++ l p+ +++ + e gi|2464947 1020 QH--KQ-DPHGPNIALFVGNL--PTG-----------------LSQRNYE 1047 ellplvkerlktlkelrlltryffeapdvvedadedvakklfkeedkevL ++l+ k + ++ +++ p +++ + l e+ + + gi|2464947 1048 QILN---------KYVTDENKFISIGP---IYYEYGSV-VLTFEDSMKAV 1084 eklkekLeklkgvihWtpeeie.aikvrlaeelglKgkklfmplRvalTG +++++ L+++ i +++ + + l++ + + ++ +R l+ gi|2464947 1085 RAFYN-LRET---IIEDK---KlLVL-LLPNIE---PSMVPSDVRPLLVF 1123 saegpelfetiellGkeeqleRlgyalad<-* ++++ + +el++ ++ l + ++ gi|2464947 1124 VNVKSGGCQGLELIS---SFRKLLNPYQV 1149 PF03792: domain 1 of 1, from 706 to 876: score -116.2, E = 7.7 *->segtvrhdkrkdIgdlLqevlkItdqtLDeeqvNakKhqLkchpmkr s ++ + +g lL +v q L v +K+ + + + gi|2464947 706 SDASDG----HVLGRLLRQV----RQGL---SVGWRKPRYQKRRARS 741 AlfdVLcEiKeKtvLSvrnmkdeeppdPqlmRLDnMLvAEGVAGPdkGG. E + + + kdee +AE GP GG gi|2464947 742 -----ISEEFSSGD--TPRFKDEESAS----------KAESGHGPSSGGa 774 .........GaAAsllaaqasgGtSlsidGaDsalehsdYRqkLlqiRri ++++++++ G+ ++ a+++++G+S +Ds h ++ + R gi|2464947 775 gggggsggaGGSSAAGASASAAGGSSGHYRPDSGSGHKSDKS--EKDREK 822 yenElkkYekaCneFtehVenlLreQSrtRPItqkeiErmvniisrKFns e+E + + + + r+Q r v i++r + gi|2464947 823 KEKEREEKDIEMIKVFDGNNSFRRQQYR------------VIIVQRTYT- 859 iqvqLKQstCEaVmiLrsRFLD<-* qL ++ a i r D gi|2464947 860 -LEQLLTTALRAFHITR----D 876 PF01391: domain 1 of 1, from 761 to 818: score -86.6, E = 9.5 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp + + G G +G G G aG+ +G++ ++ + G+ G+ p gi|2464947 761 -SKAESGHGPSSGGAGGGGGSGGAGGSSAAGAS-ASAAGGSSGHYRP 805 pGppGapGapGpp<-* G + ++ + gi|2464947 806 DSGSGHKSDKSEK 818 PF00503: domain 1 of 1, from 819 to 1019: score -308.3, E = 6.9 *->seeekeqakrnkeIekqLkqekkkakrevKLLLLGAGESGKSTIlKQ e+ke+++ k+Ie ++ r gi|2464947 819 DREKKEKEREEKDIEMIKVFDGNNSFR-------------------- 845 MKIIHgnGFSqEEkkeyrpvIyqNivqsmrvlvdAmetLgIpfgdperea + +yr +I q ++ ++l A +++I+ +++ gi|2464947 846 -------------RQQYRVIIVQRTYTLEQLLTTALRAFHITRDPQ---- 878 seadavmiletaketeeveeplpkeyadaikaLWkDpGiqecfdRsrEfq a ++ + +++++ e p+ ++ L +G + +++ + gi|2464947 879 ----AFYLTDLYAPAG--MEDTPMLDPTPVLNLVHLEGKRPAIYLR---- 918 LnDSAkYFLdnldRisdpdYiPTeQDILrsRvkTTGIvEtkFsvkkltFR + + dR + + Y +kl+ gi|2464947 919 --------FHDRDRGHVRVY-----------------------PGKLQ-- 935 MfDVGGQRSERKKWIHCFEdVTAIIFlVALSEYDQvLfEDettNRMqESL +S + ED+ + gi|2464947 936 -----------------------------CS-----MLEDPYV------- 944 kLFdsIcNnrwFvntSiILFLNKkDLFeEKIkktpssisd.yFPeYedys s+ n+ +I kDL ++ + k ++d+ +Y+ + gi|2464947 945 ----SVPVD----NSTVI-----KDLIRDALDKFG--LQDnQIQDYR--C 977 .....sGppqdyeaAkeFIkkkFvslnrnnekpkKeIYsHfTCATDTnnI ++ + + ++ +s n+++ + gi|2464947 978 sevllDRG----------VTERILSWNERPWD------------------ 999 rfVFdaV.kDiIlqenLkecGL<-* +++ +kD I q+ L+ + gi|2464947 1000 --IMKQLgKDSIRQMELMRFYM 1019 SM00314: domain 1 of 2, from 831 to 922: score 9.9, E = 0.011 *->dtyvlrVyvddlsavdpgqtyktlrvskrtTardViqqllekfhltd d + ++V+++ +s + q+y+++ v + T ++++ +l+ fh+t gi|2464947 831 DIEMIKVFDGNNSF--RRQQYRVIIVQRTYTLEQLLTTALRAFHITR 875 edpeeYvLvevlp.sggkErvLlddenPlqlqklwprdaksprqsslrFv dp+ + L + + g ++ +ld + +l l+ l +++ + ++ gi|2464947 876 -DPQAFYLTDLYApAGMEDTPMLDPTPVLNLVHLEGKR--------PAIY 916 Lrkrdd<-* Lr +d+ gi|2464947 917 LRFHDR 922 PF04041: domain 1 of 1, from 887 to 1149: score -168.6, E = 4.9 *->elrKiptipileRpsyitGfdsriennPiiGRgpvrkpvavFNPavv + ++ p l p+ ++ + v+ gi|2464947 887 APAGMEDTPMLD-PT----PVLNL---------------------VH 907 lyegeLrVYaRfVmlYrayvediatfrIgLadssdGRCSeinfkkepepv l +++ +Y Rf+ r+ v + ++ + ++d+ ++ + v gi|2464947 908 LEGKRPAIYLRFHDRDRGHV-RVYPGKLQCSMLEDP---YVSVPVDNSTV 953 v..lPedkwElwGPsYvEDPRvvkigkryymTYTGydgkyarlcvattkn +++l +d +++G D i + y + l+ ++t gi|2464947 954 IkdLIRDALDKFG---LQD---NQIQD--------YRCSEVLLDRGVTER 989 lltwarlgNGeWvkfaefelnedrislwtksgaifPvKinGkyvmyfris l+w + W + k+ + ++ m ++ gi|2464947 990 ILSWNERP---WDIMKQLG----------KDSIRQMELMR--FYMQHK-- 1022 DnvHdldsniwLavSnvddlvhWenerepSYidvgsprpgmfdapFElKi + H +++ + v+n + ++ +n+ + ++ +++ i gi|2464947 1023 QDPHGPNIALF--VGNLPTGLSQRNYEQI------LNKYVTDEN---KFI 1061 GwgtPPveteeGwSLLVLiHGvNvaGrytenlvYRvGaaLlDlegRPskv g+ + e+G VL + + Y + + +++ gi|2464947 1062 SIGP--IYYEYGS--VVLTF---EDSMKAVRAFYNLRETIIEDKK-LLVL 1103 lartpeYILePeeewEvyGdvpnVVFPcgalvdegtgrvliyYGaADtav l+ e+ ++P+ + +p VF v g + gi|2464947 1104 LLPNIEPSMVPSDV------RPLLVF---VNVKSG------------GCQ 1132 GLAeipGdleelmnflke<-* GL i+ ++l+n gi|2464947 1133 GLELIS-SFRKLLNPYQV 1149 PF00788: domain 1 of 1, from 923 to 1024: score 60.5, E = 4e-15 *->dqgvlrvyfqdllsvtpgvayKtirvssedtapdViqeaLeKfrldd d g++rvy++ l ++y ++ v +++ + d+i++aL Kf+l+d gi|2464947 923 DRGHVRVYPGKLQCSMLEDPYVSVPVDNSTVIKDLIRDALDKFGLQD 969 RMedpeeYaLvevlltregalesggkerkLpddenPlqlrlnlprddrrs + +++Y evl l++g++er+L+ +e+P ++++l++d s gi|2464947 970 --NQIQDYRCSEVL------LDRGVTERILSWNERPWDIMKQLGKD---S 1008 vrqqsslrFlLkrrdd<-* +rq rF+++ ++d gi|2464947 1009 IRQMELMRFYMQHKQD 1024 SM00314: domain 2 of 2, from 923 to 1024: score 48.5, E = 1.7e-11 *->dtyvlrVyvddlsavdpgqtyktlrvskrtTardViqqllekfhltd d +++rVy++ l ++y+++ v++ t +d+i+++l kf+l+d gi|2464947 923 DRGHVRVYPGKLQCSMLEDPYVSVPVDNSTVIKDLIRDALDKFGLQD 969 edpeeYvLvevlp.sggkErvLlddenPlqlqklwprdaks.prqsslrF ++ ++Y evl+++g++Er+L +e+P ++k +++d s ++ + +rF gi|2464947 970 NQIQDYRCSEVLLdRGVTERILSWNERPWDIMKQLGKD--SiRQMELMRF 1017 vLrkrdd<-* +++++ d gi|2464947 1018 YMQHKQD 1024 PF04396: domain 1 of 1, from 937 to 1001: score -40.4, E = 9 *->eesaeakTsVfWDvEdCPvPdGldarrVapnIksALeksGYpGpVSI +++ +sV P + + + + I+ AL k gi|2464947 937 SMLEDPYVSV---------PVD-NSTVIKDLIRDALDK--------- 964 tAYGdltkiprdtfilvsstiqllraLsstGIsLkhvPaGdkKdArdkki +G + ++ d ++ +++L+++G+ gi|2464947 965 --FGLQDNQIQD--------YRCSEVLLDRGV------------------ 986 lvdillWaldNppPanlm<-* + il+W N P +m gi|2464947 987 TERILSW---NERPWDIM 1001 PF02376: domain 1 of 1, from 969 to 1022: score -42.8, E = 3.1 *->nqqigmneelDTaeIarrvkeeLkrhnIgQriFAekvLGlSQGslSd qi ++ e L ++++ ri gi|2464947 969 DNQIQDYR----------CSEVLLDRGVTERI--------------- 990 LLrkPK.PWskLtqkGREpFrRMqnWLsdpnavrdlilqqek<-* L++ ++PW++++q G+++ r M + ++ ++q k gi|2464947 991 -LSWNErPWDIMKQLGKDSIRQM---------ELMRFYMQHK 1022 SM00361: domain 1 of 1, from 1031 to 1103: score -4.9, E = 1.3 *->lvlvnglvspeeakdEdferelseeeeyfgevgkinKivinkvtkrl ++v +l +++++ + ++e+ l ++ v+++nK++ ++ + gi|2464947 1031 ALFVGNL-PTGLS-QRNYEQILN----KY--VTDENKFISIGPIY-- 1067 NayenhkrgsggvYitFFersEDAarAivdlnGryfdGRtlkae<-* + gs v++tF e s A rA +l + ++ l+ gi|2464947 1068 -----YEYGS--VVLTF-EDSMKAVRAFYNLRETIIEDKKLLVL 1103 SM00360: domain 1 of 1, from 1031 to 1103: score 11.0, E = 0.0059 *->tlfVgNLndppdvteedLrelF.kevksvevfraeteskfGkvvsvr lfVgNL p ++ +++ +k+v k +s+ gi|2464947 1031 ALFVGNL--PTGLSQRNYEQILnKYV-----------TDENKFISIG 1064 ivrdkdnilirressleqkvqlgkdsgTGkskGfaFVeFedeedAekAll ++ + G +Fed A +A gi|2464947 1065 PIYYEY--------------------------GSVVLTFEDSMKAVRA-- 1086 iealnaskGkeledgGrptlglrVe<-* +l + + ++++ l V gi|2464947 1087 FYNLR---ETII--EDKK---LLVL 1103 PF00076: domain 1 of 1, from 1032 to 1102: score 11.2, E = 0.015 *->lfVgNLppdvteedLkdlFskfGpivsikivkDhiekpketgkskGf lfVgNLp+ +++ +++ k+ ++ i++ G gi|2464947 1032 LFVGNLPTGLSQRNYEQILNKYVTDENKFISIG------PIYYEYGS 1072 aFVeFeseedAekAlealnGkelggrklrv<-* +Fe++ A +A +l +++++++kl v gi|2464947 1073 VVLTFEDSMKAVRAFYNLRETIIEDKKLLV 1102 PF03208: domain 1 of 1, from 1098 to 1256: score -89.5, E = 6.5 *->sslevissikeslqsslsslRPWgEFldfsa.....fSrPsSfseat ++l v+ + +++ s RP ++F+++++++ +++ + sSf + gi|2464947 1098 KKLLVLLLPNIEPSMVPSDVRPLLVFVNVKSggcqgLELISSFRKL- 1143 sRvkrNlsyFrvNYvlIfavliiysLitnPllLvvililva.awlfLYlr l ++v + + l +y +P+ +vi l+++++ LY++ gi|2464947 1144 ------LNPYQVFDLDNGGPLPGYV---QPITVFVIRPLIFdSIISLYVF 1184 rsldepLVlfGrsisdrqlyvgLilvsipvlf..Ltgvgs........vl r + i++ ++v+ + ++i+ +++ L +vg++++ ++++++ gi|2464947 1185 R--Q---------ITNYKILVCGGDGTIGWVLqcLDNVGQdsecssppCA 1223 iwtvgas..vvvvlvHAafrenpddlfvdEqee<-* i +g++++++ vl ++ + + +d ++ ++ gi|2464947 1224 IVPLGTGndLARVLCWGSGYTGGEDPLNLLRDV 1256 COG1597: domain 1 of 1, from 1117 to 1512: score -73.3, E = 0.0053 *->mkrarliyNptaGkgkakkalrevadrLe.................k +++ ++++N+++G+ ++ + +++ L + + + +++++ ++ + gi|2464947 1117 VRPLLVFVNVKSGGCQGLELISSFRKLLNpyqvfdldnggplpgyvQ 1163 rggeasvrvttepgvagdAvriakeaaadgrieavDlviaaGGDGTinev + + +r ++ + + ++++ +++ ++++GGDGTi+ v gi|2464947 1164 PITVFVIRPLIFDS--IISLYVFRQITNYK-------ILVCGGDGTIGWV 1204 angLagtdgevkafnkpaLgilPaGTgNdFARaLgIPrddieaaakaiad + L + + ++ + p+ +i+P+GTgNd+AR L + ++ + + gi|2464947 1205 LQCLDNVGQDSE-CSSPPCAIVPLGTGNDLARVLCWGS-GYTGGEDPLNL 1252 gktrqvDlgrasyglqrekaneryflniaggGfgae.vtkrvneelkrrl D+ +a+ e +r+ + + e+++k+ + ++++ gi|2464947 1253 L----RDVIEAE-----EIRLDRWTVVFHPEDKPEEpAMKAPSQTTGKKK 1293 GplaYllaalrrlsrlrpfplairvdgdgksfegealfllvnntn..... + + l+ +++++ + p+ + d+ g+ ++ ++v+n++ + + gi|2464947 1294 KAHQAHLSQSQQTNQHHQLPALTSSDISGGAQNEDNSQIFVMNNYfgigi 1343 .................................................. + + + ++ +++++++ +++ ++++ + + ++ +++ ++ +++ gi|2464947 1344 dadlcldfhnareenpnqfnsrlrnkgyyvkmglrkivgrkavkdlqkel 1393 .............................npyyGGgmklaPdasldDGll + + +++ + ++ ++ + + +++ np ++ ++ DG+l gi|2464947 1394 rlevdgkivelppvdgiiilnilswgsgaNPWGPDKDDQFSTPNHYDGML 1443 dviivkaase.a.qllellrllrdvlrGkkhrehpevehlqakkieieth v+ v +++ l ++ +r+++r q+ +i+i++ gi|2464947 1444 EVVGV----TgVvHLGQIQSGIRTAMRI-----------AQGGHIKIHLN 1478 gdqakpipvqlDGEiypgalPvririlpgalrvlvPadr<-* + +pvq+DGE+ P + +l+ al + + + + gi|2464947 1479 T----DMPVQVDGEPW-IQSPGDVVVLKSALKATMLKKN 1512 PF00781: domain 1 of 1, from 1119 to 1267: score 167.8, E = 2e-47 *->plLVfvNPkSGggqgekelaseskllqkfrelLnprqVfdltktggp plLVfvN kSGg+qg +l+ +fr+lLnp+qVfdl++ ggp gi|2464947 1119 PLLVFVNVKSGGCQGL-------ELISSFRKLLNPYQVFDLDN-GGP 1157 avg....................lelfrdlpdfkeqdqGddrvlvcGGDG +g ++ + ++ ++ +l +fr++ ++ +lvcGGDG gi|2464947 1158 LPGyvqpitvfvirplifdsiisLYVFRQITNY--------KILVCGGDG 1199 TvgwVlnaldklelplqcqrefpkPpvgilPlGTGNdLarvLgwgggydg T+gwVl++ld+ ++ c Pp++i+PlGTGNdLarvL wg gy+g gi|2464947 1200 TIGWVLQCLDNVGQDSECS----SPPCAIVPLGTGNDLARVLCWGSGYTG 1245 aqlinekllkilgdaleeadtvmldrW<-* e +l+ l+d++ ea + ldrW gi|2464947 1246 ----GEDPLNLLRDVI-EAEEIRLDRW 1267 SM00046: domain 1 of 1, from 1119 to 1267: score 166.6, E = 4.6e-47 *->plLVfvNPkSGggqgeellkseskllrkfrelLnprqVfdltktggp plLVfvN kSGg+qg +l+ +fr+lLnp+qVfdl++ ggp gi|2464947 1119 PLLVFVNVKSGGCQGL-------ELISSFRKLLNPYQVFDLDN-GGP 1157 dvgle....................lefrdvpkfkeqsdqkgddrvlvcG +g++++ + ++ ++ + + fr++ ++ +lvcG gi|2464947 1158 LPGYVqpitvfvirplifdsiislyV-FRQITNY----------KILVCG 1196 GDGTvgwVlnaldkrelplqcqvedrefpePPvailPlGTGNdLarvLgw GDGT+gwVl++ld+ +++ c+ PP+ai+PlGTGNdLarvL w gi|2464947 1197 GDGTIGWVLQCLDNVGQDSECS-------SPPCAIVPLGTGNDLARVLCW 1239 gggydginekllkilkealeeadtvkldrW<-* g gy+g e +l l++++ ea+ + ldrW gi|2464947 1240 GSGYTG-GEDPLNLLRDVI-EAEEIRLDRW 1267 COG0284: domain 1 of 1, from 1333 to 1511: score -107.0, E = 2.3 *->laadlplsvmndprlIrvALDvpd..redalalveelddeeyvlfiK ++ + ++++d +l+ LD+++ ++e+ ++ +l + ++K gi|2464947 1333 FVMNNYFGIGIDADLC---LDFHNarEENPNQFNSRLRN--KGYYVK 1374 vGlaFFeLflsaGpdivkeLkargklgvkvFLDLKlhDIPnTvalaakal Gl + G ++vk+L ++ ++ D K+ P ++ + gi|2464947 1375 MGL---RKIV--GRKAVKDLQKEL----RLEVDGKIVELPPVDGIIILNI 1415 aelgplAaDmvtVHafgGeemlraavealeelgkGkrPlLiaVtvLTSms +g +g + + g + gi|2464947 1416 LSWG-----------SG-----------ANPWG----------------P 1427 epgllqeigidnsladqvirlaklakeaGldGvVcGAspqeaaaiRealg ++++ ++ ++++ + + l + G+ GvV +q + iR a+ gi|2464947 1428 DKDD----QFSTPNH----YDGMLEVV-GVTGVVH--LGQIQSGIRTAMR 1466 egspdflilTPGIRaDkgsakgDQgRvmTpaeAiaaGaDyiVVGRpItqA + a ++g ++++ V G p q+ gi|2464947 1467 --I-----------A-------QGGH---IKIHLNTDMPVQVDGEPWIQS 1493 GedPvaaaeaireaaemalee<-* + v +++ ++ + ++l++ gi|2464947 1494 -PGDVVVLKSAL--KATMLKK 1511 SM00045: domain 1 of 1, from 1334 to 1489: score 262.2, E = 7.8e-76 *->iMNNYFSiGvDAkiaLeFHnsREanPekFnSRlkNKlwYfelGtkel +MNNYF+iG+DA+ +L+FHn+RE+nP FnSRl+NK +Y+++G++++ gi|2464947 1334 VMNNYFGIGIDADLCLDFHNAREENPNQFNSRLRNKGYYVKMGLRKI 1380 fa.rtcKdLheqIeLecDGvdidlpnkdlslEGIivLNIPSygGGtnLWG + ++ KdL + +Le+DG++++lp GIi+LNI S+g+G+n+WG gi|2464947 1381 VGrKAVKDLQKELRLEVDGKIVELP----PVDGIIILNILSWGSGANPWG 1426 ePfgskkkravcgifkksftdkedlnfekqsidDgllEVVGvtgamhmaq ++d++f+ + ++Dg+lEVVGvtg++h++q gi|2464947 1427 P--------------------DKDDQFSTPNHYDGMLEVVGVTGVVHLGQ 1456 vrtsiqvglasiilvkllKgrRiaQCsevrlkdtiltkktiPmQVDGEP< + q g + + RiaQ+ ++++ + t +P+QVDGEP gi|2464947 1457 I----QSGIRT--------AMRIAQGGHIKI--HLNT--DMPVQVDGEP 1489 -* gi|2464947 - - PF00609: domain 1 of 1, from 1334 to 1489: score 237.0, E = 2.9e-68 *->iINNYFSiGVDAsialrFHimREknPekFnSRmkNKlwYfefGtset ++NNYF+iG+DA ++l+FH +RE+nP FnSR++NK++Y+++G+ ++ gi|2464947 1334 VMNNYFGIGIDADLCLDFHNAREENPNQFNSRLRNKGYYVKMGLRKI 1380 l.astcknLhesvelecdGqevdLsnrDaslEGIiiLNIPSygGGsnLWG ++ + k+L + +le+dG+ v+L+ GIiiLNI S+g+G+n+WG gi|2464947 1381 VgRKAVKDLQKELRLEVDGKIVELP----PVDGIIILNILSWGSGANPWG 1426 eskkgkgdigefkksitdpkdlktavqdidDgLlEVVGlegamhmgQiyT +kd++++ + + Dg lEVVG++g++h+gQi gi|2464947 1427 P-----------------DKDDQFSTPNHYDGMLEVVGVTGVVHLGQI-- 1457 siqlklasWvkLmkgrRlaQCsevRlkDtiktkktlPMQVDGEP<-* q + + + R+aQ+ + +i+ + +P+QVDGEP gi|2464947 1458 --QSGIRT------AMRIAQGGHI----KIHLNTDMPVQVDGEP 1489 SM00463: domain 1 of 1, from 1346 to 1416: score -20.0, E = 9.4 *->ewtLDLHGltveeAlqaLkkfldaarlrgletgervdlpkkleIitG +++LD+H ++e q+ ++ + + + gi|2464947 1346 DLCLDFHNAREENPNQFNSRLRNKGYYVK------------------ 1374 kGkhslvngkskvkpalkehlqHkhvesfrfaepsegnsGvlvvklk<-* +G ++ g+++vk ke+ +v++ + + +G +++ ++ gi|2464947 1375 MGLRKI-VGRKAVKDLQKELRL--EVDGKIVEL--PPVDGIIILNIL 1416 PF04014: domain 1 of 1, from 1447 to 1487: score -14.1, E = 5.1 *->ivvKVdrnGqIVIPkeiRekLGikeGDiLEievdgdggeIilrkykp v+ V +GqI iR ++ i+ G + i ++ d +++++ gi|2464947 1447 GVTGVVHLGQIQ--SGIRTAMRIAQGGHIKIHLNTDM----PVQVDG 1487 <-* gi|2464947 - - // -------------- next part -------------- ###################################################################################################### */ # COPYRIGHT INFORMATION # Pfam DOMAIN RESULTS PARSER # @AUTHOR: Wagied Davids # @DATE: 22.01.2004 ###################################################################################################### */ import sys import string import re import time class PfamEntry: ''' Prototype class Entry structure @author: Wagied Davids @date: 22.01.2004 @copyright: Wagied Davids, ?, 2004 ''' # STATIC DATA NO_HITS= '[no hits above thresholds]'; # STATIC REGEX OBJECTS REGEX_FAMILY_SCORES= re.compile( r'((\S.*?)\s+(\S.*?)\s+((-| )\S.*?)\s+(\S.*?)\s+(\d+))', re.MULTILINE | re.DOTALL ); def __init__( self, query= None, accession= None, description= None, family_scores= [], parsed_domains= [], alignments= [] ): ''' Constructor for Pfam Entry structure @param ( query= None, accession= None, description= None, family_scores= [], parsed_domains= [], alignments= [] ) @return (None) ''' if query != None: self.query= query; self.accession= accession; self.description= description; self.family_scores= family_scores; self.family_scores_hitlist= []; # FAMILY SCORES HITLIST FOR SCORE ENTRIES self.parsed_domains= parsed_domains; self.alignments= alignments; else: print 'Error: Query must be provided'; sys.exit( -1 ); def getQuery( self ): ''' Retrieves the QUERY @param (None) @return (String: QUERY ) ''' return self.query; def getAccession( self ): ''' Retrieves the ACCESSION @param (None) @return (String: ACCESSION); ''' return self.accession; def getDescription( self ): ''' Retrieves the DESCRIPTION @param (None) @return (String: DESCRIPTION) ''' return self.description; def getFamilyScoresRaw( self ): ''' Retrieves a list of FAMILY SCORES @param (None) @return (List: FAMILY SCORES) ''' return self.family_scores; def getNoOfFamilyEntries( self ): ''' Retrieves the number of hits per query @param (None) @return (Integer: number of hits per query) ''' return len( self.family_scores ); def getFamilyScoresML( self ): ''' FINE-GRAINED CONTROL OVER FAMILY CLASSIFICATION AND SCORE RESULTS @param (None) @return () ''' # BEGIN FAMILY_SCORE_LIST TAG family_scores= "\n"; family_scores_counter= 1; for score_entry in self.getFamilyScoresRaw(): MATCH_SCORE_ENTRY= PfamEntry.REGEX_FAMILY_SCORES.search( score_entry ); if MATCH_SCORE_ENTRY != None: # BEGIN FAMILY_SCORE_HIT TAG family_scores= family_scores + "\t\t\n" % ( family_scores_counter ); # EXTRACT INFORMATION FROM MATCH_SCORE_ENTRY # MATCH_SCORE_ENTRY.group( 1 ) equals WHOLE ENTRY FAMILY_MODEL= MATCH_SCORE_ENTRY.group( 2 ); FAMILY_DESCRIPTION= MATCH_SCORE_ENTRY.group( 3 ); FAMILY_SCORE_VALUE= MATCH_SCORE_ENTRY.group( 4 ); # MATCH_SCORE_ENTRY.group( 5 ) equals '-' IF PRESENT FAMILY_E_VALUE= MATCH_SCORE_ENTRY.group( 6 ); FAMILY_N_VALUE= MATCH_SCORE_ENTRY.group( 7 ); # FORMAT ENTRY TAGS family_scores= family_scores + "\t\t\t%s\n" % ( FAMILY_MODEL ); family_scores= family_scores + "\t\t\t%s\n" % ( FAMILY_DESCRIPTION ); family_scores= family_scores + "\t\t\t%s\n" % ( FAMILY_SCORE_VALUE ); family_scores= family_scores + "\t\t\t%s\n" % ( FAMILY_E_VALUE ); family_scores= family_scores + "\t\t\t%s\n" % ( FAMILY_N_VALUE ); # COMPLETE FAMILY_SCORE_HIT TAG family_scores= family_scores + "\t\t\n"; # INCREMENT family_scores_counter family_scores_counter= family_scores_counter + 1; # COMPLETE FAMILY_SCORE_LIST TAG family_scores= family_scores + "\t\n"; return family_scores; def getParsedDomainsRaw( self ): ''' Retrieves a list of PARSED DOMAINS @param (None) @return (List: PARSED DOMAINS) ''' return self.parsed_domains; def getNoOfParsedDomains( self ): ''' Retrieves the number of parsed hits per query @param (None) @return (Integer: number of parsed hits per query) ''' return len( self.parsed_domains ); def getParsedDomainsML( self ): ''' FINE-GRAINED CONTROL OVER PARSED DOMAINS AND SCORE RESULTS @param (None) @return () ''' parsed_domain_list= []; PARSED_MODEL= ''; PARSED_DOMAIN_NUMBER= ''; PARSED_DOMAIN_SEQ_F= ''; PARSED_DOMAIN_SEQ_T= ''; PARSED_DOMAIN_SEQ_F= ''; PARSED_DOMAIN_2_DOTS= ''; PARSED_DOMAIN_BRACKETS= ''; PARSED_DOMAIN_SCORE= ''; PARSED_DOMAIN_E_VALUE= ''; parsed_domains_counter= 1; # BEGIN PARSED_DOMAINS_LIST TAG parsed_domains= '\n'; for domain in self.getParsedDomainsRaw(): # IF NO_HITS NOT FOUND, THEN EXTRACT DATA if string.find( domain, PfamEntry.NO_HITS ) < 0: parsed_domain_list= string.split( domain ); PARSED_MODEL= parsed_domain_list[0]; PARSED_DOMAIN_NUMBER= parsed_domain_list[1]; PARSED_DOMAIN_SEQ_F= parsed_domain_list[2]; PARSED_DOMAIN_SEQ_T= parsed_domain_list[3]; #PARSED_DOMAIN_2_DOTS= parsed_domain_list[4]; PARSED_DOMAIN_HMM_F= parsed_domain_list[5]; PARSED_DOMAIN_HMM_T= parsed_domain_list[6]; #PARSED_DOMAIN_BRACKETS= parsed_domain_list[7]; PARSED_DOMAIN_SCORE= parsed_domain_list[8]; PARSED_DOMAIN_E_VALUE= parsed_domain_list[9]; # BEGIN PARSED_DOMAIN_HIT TAG parsed_domains= parsed_domains + "\t\t\n" % ( parsed_domains_counter ); # FORMAT ENTRY TAGS parsed_domains= parsed_domains + "\t\t\t%s\n" % ( PARSED_MODEL ); parsed_domains= parsed_domains + "\t\t\t%s\n" % ( PARSED_DOMAIN_NUMBER ); parsed_domains= parsed_domains + "\t\t\t%s\n" % ( PARSED_DOMAIN_SEQ_F ); parsed_domains= parsed_domains + "\t\t\t%s\n" % ( PARSED_DOMAIN_SEQ_T ); parsed_domains= parsed_domains + "\t\t\t%s\n" % ( PARSED_DOMAIN_HMM_F ); parsed_domains= parsed_domains + "\t\t\t%s\n" % ( PARSED_DOMAIN_HMM_T ); parsed_domains= parsed_domains + "\t\t\t%s\n" % ( PARSED_DOMAIN_SCORE ); parsed_domains= parsed_domains + "\t\t\t%s\n" % ( PARSED_DOMAIN_E_VALUE ); # COMPLETE PARSED_DOMAIN_HIT TAG parsed_domains= parsed_domains + "\t\t\n"; # INCREMENT parsed_domains_counter parsed_domains_counter= parsed_domains_counter + 1; else: # NO_HITS FOUND return domain; # COMPLETE PARSED_DOMAINS_LIST TAG parsed_domains= parsed_domains + '\n'; return parsed_domains; def getAlignments( self ): ''' Retrieves a list of TOP SCORING ALIGNMENTS @param (None) @return (List: TOP SCORING ALIGNMENTS) ''' return self.alignments; def getRegexFamilyScores( self ): ''' Retrieves the Regex object for Pfam family scores @param (None) @return (Regex: Regex object for Pfam family scores) ''' return PfamEntry.REGEX_FAMILY_SCORES; def __str__( self ): ''' Retrieves a string representation of parser entry class @param (None) @return (None) ''' strBuffer= ''; strBuffer= strBuffer + "\n"; strBuffer= strBuffer + "\t%s\n" % ( self.getQuery() ); strBuffer= strBuffer + "\t%s?\n" % ( self.getAccession() ); strBuffer= strBuffer + "\t%s\n" % ( self.getDescription() ); strBuffer= strBuffer + "\t%s" % ( self.getFamilyScoresML() ); strBuffer= strBuffer + "\t%s" % ( self.getParsedDomainsML() ); strBuffer= strBuffer + "\t%s\n" % ( self.getAlignments() ); strBuffer= strBuffer + ""; return strBuffer; class PfamParser: ''' Prototype class for parsing hmmpfam output @author: Wagied Davids @date: 22.01.2004 @copyright: Wagied Davids, ?, 2004 ''' # DECLARATION OF STATIC DATA HMM_HEADER_SEPERATOR= '-'; HMM_FILE= 'HMM file:'; HMM_SEQ_FILE= 'Sequence file:'; HMM_QUERY_SEQ= 'Query sequence:'; HMM_ACC= 'Accession:'; HMM_DESCRIPTION= 'Description:'; HMM_SCORE_HEADER= 'Scores for sequence family classification (score includes all domains):'; HMM_PARSED_DOMAINS= 'Parsed for domains:'; HMM_ALIGNMENT= 'Alignments of top-scoring domains:'; HMM_SPACE= ' '; HMM_TAB= '\t'; HMM_NEWLINE= '\n'; HMM_ENTRY_SEPERATOR= '//'; HMM_ENTRY_COUNTER= 0; # STATIC DATA STRUCTURE # STATIC REGEX OBJECTS REGEX_HMM_ENTRY= re.compile( r'(Query sequence:\s+\S.*\s+//)', re.MULTILINE | re.DOTALL ); REGEX_HMM_QUERY= re.compile( r'Query sequence:\s+(\S.*?)\s+Accession', re.MULTILINE | re.DOTALL ); REGEX_HMM_ACC= re.compile( r'Accession:\s+(\S.*?)\s+Description', re.MULTILINE | re.DOTALL ); REGEX_HMM_DESCRIPTION= re.compile( r'Description:\s+(\S.*?)\s+Scores', re.MULTILINE | re.DOTALL ); REGEX_HMM_SEQ_FAMILY_SCORES= re.compile( r'(Scores\s+\S.*)\s+Parsed', re.MULTILINE | re.DOTALL ); REGEX_HMM_PARSED_DOMAINS= re.compile( r'(Parsed for domains:\s+\S.*)\s+Alignments', re.MULTILINE | re.DOTALL ); REGEX_HMM_ALIGNMENTS= re.compile( r'(Alignments of top-scoring domains:\s+\S.*)\s+//', re.MULTILINE | re.DOTALL ); def __init__( self, filename= None ): ''' Constructor for PfamParser @param (Filename) @return (None) ''' if filename != None: self.filename= filename; self.debug= 0; self.HMM_FAMILY_SCORES_HITS= {}; self.HMM_PARSED_DOMAINS= {}; else: print 'Please enter filename'; sys.exit ( -1 ); def setDebug( self, debug= 0 ): ''' Sets the debug level when parsing debug= 0 No debug information debug= 1 Pfam Entry level debug information debug= 2 Regex level debug information debug= 3 Incoming data @param (Integer representing the verbosity/ debug level) @return (None) ''' self.debug= debug; return ; def getFilename( self ): ''' Retrieves the filename @param (None) @return (String: Filename) ''' return self.filename; def parse( self ): ''' MAIN PARSER FUNCTION @param (None) @return (None) ''' try: mode= 'r'; line= ''; data_entry= ''; HMM_QUERY= ''; HMM_ACC= ''; HMM_DESCRIPTION= ''; HMM_SCORES= ''; HMM_DOMAINS= ''; HMM_ALIGNMENTS= ''; # FAMILY SCORES FAMILY_SCORES_TITLE= ''; FAMILY_SCORES_HEADER= ''; FAMILY_SCORES_INFO_LIST= []; FAMILY_SCORES_LIST= []; # PARSED DOMAIN HITS HMM_DOMAINS= ''; PARSED_DOMAINS_INFO_LIST= []; PARSED_DOMAINS_TITLE= ''; PARSED_DOMAINS_HEADER= ''; PARSED_DOMAINS_LIST= []; # DOMAIN ALGINMENTS INFORMATION DOMAIN_ALIGN_HEADER= ''; DOMAIN_ALIGNMENTS_LIST= []; # Open file stream for reading fopen= open( self.filename, mode ); while fopen: line= fopen.readline(); if not line: break; # Pfam ENTRY DETECTED if line[ 0: len( PfamParser.HMM_QUERY_SEQ ) ] == PfamParser.HMM_QUERY_SEQ: data_entry= data_entry + string.rstrip( line ) + PfamParser.HMM_SPACE; while line[ 0: len( PfamParser.HMM_ENTRY_SEPERATOR ) ] != PfamParser.HMM_ENTRY_SEPERATOR: line= fopen.readline(); if not line: break; if line[ 0: len( PfamParser.HMM_ENTRY_SEPERATOR ) ] == PfamParser.HMM_ENTRY_SEPERATOR: PfamParser.HMM_ENTRY_COUNTER= PfamParser.HMM_ENTRY_COUNTER + 1; data_entry= data_entry + PfamParser.HMM_SPACE + PfamParser.HMM_ENTRY_SEPERATOR; # EXTRACT PFAM ENTRY INFORMATION # DEBUG INFO if self.debug == 3: print data_entry; # MATCH ENTRY STRUCTURE MATCH_HMM_ENTRY= PfamParser.REGEX_HMM_ENTRY.search( data_entry ); # DEBUG INFO if self.debug == 2: print "%s: %s" % ( MATCH_HMM_ENTRY, MATCH_HMM_ENTRY.re.pattern ); if MATCH_HMM_ENTRY != None: # DEBUG INFO if self.debug == 2: print "%d. %s" % ( PfamParser.HMM_ENTRY_COUNTER, MATCH_HMM_ENTRY ); self.ENTRY= MATCH_HMM_ENTRY.group( 1 ); #print self.ENTRY; # MATCH QUERY SEQUENCE MATCH_HMM_QUERY= PfamParser.REGEX_HMM_QUERY.search( self.ENTRY ); # DEBUG INFO if self.debug == 2: print "%s: %s" % ( MATCH_HMM_QUERY, MATCH_HMM_QUERY.re.pattern ); if MATCH_HMM_QUERY != None: HMM_QUERY= MATCH_HMM_QUERY.group( 1 ); #print HMM_QUERY, '-> ', # MATCH ACCESSION MATCH_HMM_ACC= PfamParser.REGEX_HMM_ACC.search( self.ENTRY ); # DEBUG INFO if self.debug == 2: print "%s: %s" % ( MATCH_HMM_ACC, MATCH_HMM_ACC.re.pattern ); if MATCH_HMM_ACC != None: HMM_ACC= MATCH_HMM_ACC.group( 1 ); #print HMM_ACC; # MATCH DESCRIPTION MATCH_HMM_DESCRIPTION= PfamParser.REGEX_HMM_DESCRIPTION.search( self.ENTRY ); # DEBUG INFO if self.debug == 2: print "%s: %s" % ( MATCH_HMM_DESCRIPTION, MATCH_HMM_DESCRIPTION.re.pattern ); if MATCH_HMM_DESCRIPTION != None: HMM_DESCRIPTION= MATCH_HMM_DESCRIPTION.group( 1 ); #print HMM_DESCRIPTION; # MATCH FAMILY SCORES # NB !!! --- MAXIMUM RECURSION LIMIT ---- !!! try: MATCH_HMM_SCORES= PfamParser.REGEX_HMM_SEQ_FAMILY_SCORES.search( self.ENTRY ); # DEBUG INFO if self.debug == 2: print "%s: %s" % ( MATCH_HMM_SCORES, MATCH_HMM_SCORES.re.pattern ); if MATCH_HMM_SCORES != None: HMM_SCORES= MATCH_HMM_SCORES.group( 1 ); FAMILY_SCORES_INFO_LIST= string.split( HMM_SCORES, PfamParser.HMM_NEWLINE ); FAMILY_SCORES_TITLE= FAMILY_SCORES_INFO_LIST[0]; FAMILY_SCORES_HEADER= FAMILY_SCORES_INFO_LIST[1]; #FAMILY_SCORES_TABLINE= FAMILY_SCORES_INFO_LIST[2]; # NOTE: LAST ELEMENT = EMPTY SPACE # COLLECT IN HASH FAMILY_SCORES_LIST= FAMILY_SCORES_INFO_LIST[ 3: -1 ]; self.HMM_FAMILY_SCORES_HITS[ HMM_QUERY ]= FAMILY_SCORES_LIST; except RuntimeError, run_err: print "Error: MATCHING PFAM FAMILY SCORES!" print run_err; # MATCH PARSED DOMAIN INFORMATION # NB !!! --- MAXIMUM RECURSION LIMIT ---- !!! try: MATCH_HMM_PARSED_DOMAINS= PfamParser.REGEX_HMM_PARSED_DOMAINS.search( self.ENTRY ); # DEBUG INFO if self.debug == 2: print "%s: %s" % ( MATCH_HMM_PARSED_DOMAINS, MATCH_HMM_PARSED_DOMAINS.re.pattern ); if MATCH_HMM_PARSED_DOMAINS != None: HMM_DOMAINS= MATCH_HMM_PARSED_DOMAINS.group( 1 ); PARSED_DOMAINS_INFO_LIST= string.split( HMM_DOMAINS, PfamParser.HMM_NEWLINE ); PARSED_DOMAINS_TITLE= PARSED_DOMAINS_INFO_LIST[0]; PARSED_DOMAINS_HEADER= PARSED_DOMAINS_INFO_LIST[1]; #PARSED_DOMAINS_TABLINE= PARSED_DOMAINS_INFO_LIST[2]; # NOTE: LAST ELEMENT = EMPTY SPACE # COLLECT IN HASH PARSED_DOMAINS_LIST= PARSED_DOMAINS_INFO_LIST[ 3: -1 ]; self.HMM_PARSED_DOMAINS[ HMM_QUERY ]= PARSED_DOMAINS_LIST; except RuntimeError, run_err: print "Error: MATCHING PFAM PARSED DOMAIN INFORMATION!"; print run_err; # MATCH DOMAIN ALIGNMENTS # NB !!! --- MAXIMUM RECURSION LIMIT ---- !!! try: MATCH_HMM_ALIGNMENTS= PfamParser.REGEX_HMM_ALIGNMENTS.search( self.ENTRY ); # DEBUG INFO if self.debug == 2: print "%s: %s" % ( MATCH_HMM_ALIGNMENTS, MATCH_HMM_ALIGNMENTS.re.pattern ); if MATCH_HMM_ALIGNMENTS != None: HMM_ALIGNMENTS= MATCH_HMM_ALIGNMENTS.group( 1 ); DOMAIN_ALIGNMENTS_INFO_LIST= string.split( HMM_ALIGNMENTS , "\n" ); DOMAIN_ALIGN_HEADER= DOMAIN_ALIGNMENTS_INFO_LIST[0]; DOMAIN_HIT_INFO= DOMAIN_ALIGNMENTS_INFO_LIST[1]; DOMAIN_ALIGNMENTS_LIST= DOMAIN_ALIGNMENTS_INFO_LIST[ 3:-2 ]; #print DOMAIN_ALIGNMENTS_LIST; except RuntimeError, run_err: print "Error: MATCHING PFAM DOMAIN ALIGNMENTS!"; print run_err; # Construct Pfam Entry structure Entry= PfamEntry( HMM_QUERY, HMM_ACC, HMM_DESCRIPTION, FAMILY_SCORES_LIST, PARSED_DOMAINS_LIST, DOMAIN_ALIGNMENTS_LIST); # DEBUG INFO if self.debug == 1: print Entry; #print "%s => %s" % ( Entry.getQuery(), Entry.getDescription() ); #print Entry.getFamilyScoresML(); #print Entry.getParsedDomainsML(); # CLEAN DATA VARIABLE data_entry= ''; HMM_QUERY= ''; HMM_ACC= ''; HMM_DESCRIPTION= ''; # FAMILY SCORES INFORMATION HMM_SCORES= ''; FAMILY_SCORES_INFO_LIST= []; FAMILY_SCORES_TITLE= ''; FAMILY_SCORES_HEADER= ''; FAMILY_SCORES_LIST= []; # PARSED DOMAINS INFORMATION HMM_DOMAINS= ''; PARSED_DOMAINS_INFO_LIST= []; PARSED_DOMAINS_TITLE= ''; PARSED_DOMAINS_HEADER= ''; PARSED_DOMAINS_LIST= []; # DOMAIN INFORMATION HMM_ALIGNMENTS= ''; DOMAIN_ALIGNMENTS_INFO_LIST= []; DOMAIN_ALIGN_HEADER= ''; DOMAIN_HIT_INFO= ''; DOMAIN_ALIGNMENTS_LIST= []; # ///////////////////////////////////////// ACCUMULATE DATA ////////////////////////////////////// else: data_entry= data_entry + line ; except IOError, io_err: print io_err; else: fopen.close(); return ; def getPfamHits( self ): ''' Retrieves Pfam hits QUERY => HITS DATABASE @param (None) @return (Hash: Pfam hits QUERY => HITS DATABASE) ''' return self.HMM_FAMILY_SCORES_HITS; def getNoOfHits( self ): ''' Retrieves the number of Pfam hits @param (None) @return (Integer: number of Pfam hits) ''' return len( self.getPfamHits() ); def getPfamParsedDomains( self ): ''' Retrieve Pfam parsed domains QUERY => HITS DATABASE @param (None) @return (Hash: Pfam parsed domains QUERY => HITS DATABASE) ''' return self.HMM_PARSED_DOMAINS; def getNoOfParsedDomains( self ): ''' Retrieves the number of PARSED Pfam hits @param (None) @return (Integer: number of PARSED Pfam hits) ''' return len( self.getPfamParsedDomains() ); def getRegexHMM_Entry( self ): ''' Retrieves the Regex object for REGEX_HMM_ENTRY @param (None) @return (Regex: HMM_ENTRY) ''' return PfamParser.REGEX_HMM_ENTRY; def getRegexQuery( self ): ''' Retrieves the Regex object for REGEX_HMM_QUERY @param (None) @return (Regex: REGEX_HMM_QUERY) ''' return PfamParser.REGEX_HMM_QUERY; def getRegexAccession( self ): ''' Retrieves the Regex object for REGEX_HMM_ACC @param (None) @return (Regex: REGEX_HMM_ACC) ''' return PfamParser.REGEX_HMM_ACC; def getRegexDescription( self ): ''' Retrieves the Regex object for REGEX_HMM_DESCRIPTION @param (None) @return (Regex: REGEX_HMM_DESCRIPTION) ''' return PfamParser.REGEX_HMM_DESCRIPTION; def getRegexFamilyScores( self ): ''' Retrieves the Regex object for REGEX_HMM_SEQ_FAMILY_SCORES @param (None) @return (Regex: REGEX_HMM_SEQ_FAMILY_SCORES) ''' return PfamParser.REGEX_HMM_SEQ_FAMILY_SCORES; def getRegexParsedDomains( self ): ''' Retrieves the Regex object for REGEX_HMM_DOMAINS @param (None) @return (Regex: REGEX_HMM_DOMAINS) ''' return PfamParser.REGEX_HMM_PARSED_DOMAINS; def getRegexAlignments( self ): ''' Retrieves the Regex object for REGEX_HMM_ALIGNMENTS @param (None) @return (Regex: REGEX_HMM_ALIGNMENTS) ''' return PfamParser.REGEX_HMM_ALIGNMENTS; def __str__( self ): ''' Retrieves a string representation of parser class @param (None) @return (String: Retrieves a string representation of parser class) ''' strBuffer= 'ParserType: PfamParser'; strBuffer= strBuffer + "Filename: %s" % ( self.getFilename() ); return strBuffer; -------------- next part -------------- #!/usr/bin/env python ###################################################################################################### */ # COPYRIGHT INFORMATION # Pfam DOMAIN RESULTS PARSER # @AUTHOR: Wagied Davids # @DATE: 22.01.2004 ###################################################################################################### */ import string import PfamParser # Module level re-name PfamParser= PfamParser.PfamParser; # DATA LOCATION filename= 'hmmpfam_output.example'; # DATA STRUCTURE PFAM_DB= {}; # Construct Parser parser= PfamParser( filename ); # SET DEBUG LEVEL parser.setDebug( 1 ); # parse document parser.parse(); # retrieve Pfam hits PFAM_DB= parser.getPfamParsedDomains(); counter= 1; for QUERY in PFAM_DB.keys(): for HIT in PFAM_DB[ QUERY ]: print "%d. %s => %s" % ( counter, QUERY, HIT ); counter= counter + 1;