[Biopython-dev] hmmpfam parser
    Wagied 
    Wagied.Davids at ebc.uu.se
       
    Fri Jan 30 15:48:10 EST 2004
    
    
  
Hi,
I have some code which is able to parse hmmer output,
as well as code donated by Joanne Adamkewicz from Exilexis.
If you guys/gals find it useful, updates and modification will be done!
Wagied Davids
Dept.of Molecular Evolution
Uppsala University
Sweden
-------------- next part --------------
###########################################################################
# Copyright (c) 1997-2004 Exelixis Pharmaceuticals, Inc. All Rights Reserved.
#
#  Permission is hereby granted, free of charge, to any person obtaining
#  a copy of this software and associated documentation files (the
#  "Software"), to deal in the Software without restriction, including
#  without limitation the rights to use, copy, modify, merge, publish,
#  distribute, sublicense, and/or sell copies of the Software, and to
#  permit persons to whom the Software is furnished to do so, subject to
#  the following conditions:
#  
#  The above copyright notice and this permission notice shall be
#  included in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL, INDIRECT OR
# CONSEQUENTIAL DAMAGES OR ANY CLAIM, DAMAGES OR OTHER LIABILITY WHATSOEVER,
# WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION OR
# OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
# USE, PERFORMANCE OR OTHER DEALINGS IN THE SOFTWARE.
# 
# 
#   Module Notes :
# 
#     hmmpfam related routines
# 
#   Original Authors : Joanne Adamkewicz, Darren Platt
# 
###########################################################################
import re 
# To make valid hyperlinks, append the domain accession number to the end of the url string.
accno2url = {'PF': 'http://www.sanger.ac.uk/cgi-bin/Pfam/getacc?',
             'SM': 'http://smart.embl-heidelberg.de/smart/do_annotation.pl?BLAST=DUMMY&ACC=',
             'CO': 'http://www.ncbi.nlm.nih.gov/cgi-bin/COG/COG_info.plx?',
             'TI': 'http://www.tigr.org/tigr-scripts/CMR2/hmm_report.spl?user=access&password=access&acc='
             }
def parseHmmpfam(outputfile, debug=0):
    """
    Purpose:
        hmmpfam is one of the algorithms in Sean Eddy's HMMER software.  This function will
        parse hmmpfam output that was generated either WITH or WITHOUT the --acc option
        If with, the first column of every hit and domain contains the accno (PF00595)
        If without, the 1st column contains the domain name (pkinase).  You can't get both
        in the same output. Whichever one is present is returned as the first element in
        each tuple.  In the code below, the variable is 'accno' for simplicity.
        **NOTE THAT NAMES DO NOT NECESSARILY UNIQUELY DETERMINE ACCNOS, if you are running
        against an HMM library with models from more than one source.  i.e., names are unique
        within Pfam, but not when you merge both Pfam and Smart models.  Example:
        SM00542 and PF05965 both have the name 'FYRC'.  Therefore, if both are in your library,
        you can't determine which model gave the hit unless you use --acc.
        Also, Pfam itself changes the names of its models occasionally, but accnos are stable.
        You are STRONGLY RECOMMENDED to get in the habit of running hmmpfam with the --acc option!
    Arguments:
        outputfile   - (string) - path to text file containing raw hmmpfam output
        debug - 0 or 1 - whether to print debugging statements to stdout
    Returns:
        [A,B] list, where A and B are each tuples of data as follows:
       A = hit results - 5 element tuple of:
         accno of hmm - string
         score - float
         evalue - float
         N (number of occurences of hmm domain in the seq) - int
         description of hmm - string
       B = domain results - 11 element tuple of:
         accno of hmm - string
         x/y - string (e.g. 1/2 = '1 of 2' -- identifies which occurence of the domain)
         seq-from - int
         seq-to   - int
         model-from - int
         model-to   - int
         score - float
         eval  - float
         modelAlign - seq string of the model in the alignment
         matchString - consensus string for model<->seq alignment
         subjectAlign - seq string of sequence in the alignment
       Note that proteins with no domain hits will return ( [], [] )
    Raises:
        'ParseError', with error message to indicate the problem
    """
    if debug:
        print "Now opening file '%s'" % outputfile
    s = open(outputfile, 'r')
    trigger="Description.*Score.*E-value.*N" # header for first data chart
    hits = []     # will be returned
    domains = []  # temporary data holder
    domains2 = [] # will be returned
    stage = 0     # parser has 5 stages, from top to bottom
    while 1:
        line = s.readline()
        if debug:
            print line
            print "stage = %s" % stage    
        if not line or line == None:
            # EOF
            break
        if stage == 0:
            #
            # Passing over all the header stuff until we get to the first data table
            #
            if re.search(trigger,line):
                s.readline()
                if debug: print "**Change to stage 1**"
                stage = 1
        elif stage == 1:
            #
            # Reading hits from first data table
            #
            if line.find('no hits') != -1:
                # This protein was not hit by any models
                # return empty lists
                if debug: print "No hits found!"
                return (hits, domains)
            cols = line.split()
            if len(cols) == 0:
                # Blank line = end of table
                if debug: print "**Change to stage 2**"
                stage = 2
            else:
                accno = cols[0] # if run without --acc option, this is actually model name, not accno
                desc = ' '.join(cols[1:-3]) # note that model descriptions are truncated to 38 char in raw output
                score = float(cols[-3])
                eval = float(cols[-2])
                n = int(cols[-1])
                hits.append( (accno, score, eval, n, desc) )
        elif stage == 2:
            #
            # Waiting for domains
            #
            if line.find('Domain') != -1:
                stage = 3
                if debug: print "**Change to stage 3**"
                s.readline()
        elif stage == 3:
            #
            # Parsing domains
            #
            cols = line.split()
            if len(cols) == 0:
                stage = 4
                if debug: print "**Change to stage 4**"
            else:
                accno = cols[0]
                count = cols[1] # x of y
                seqFrom = int(cols[2])
                seqTo   = int(cols[3])
                hmmFrom = int(cols[5])
                hmmTo   = int(cols[6])
                score = float(cols[-2])
                eval  = float(cols[-1])
                # Note this 8-element tuple doesn't include the alignment
                # strings, because we haven't gotten there yet.
                domains.append( (accno, count, seqFrom, seqTo, hmmFrom, hmmTo, score, eval) )
        elif stage == 4:
            #
            # Waiting for alignments
            #
            if line.startswith('Alignments of top-scoring domains'):
                stage = 5
                if debug: print "**Change to stage 5**"
        elif stage == 5:
            #
            # Parsing alignments - see sample raw output for help understanding this code
            #
            # At the start of each new model hit alignment, there is an info line with accno,
            # score, and e-value of the hit.  Then comes the alignment itself in three-line
            # groups (model, consensus, sequence), with a blank line between each group, until
            # the end of the domain is reached.
            # The parser numbers the lines with variable names as follows:
            #    one - the info line, only occurs once per domain
            #    two - model sequence
            #    three - consensus sequence
            #    four  - protein sequence
            #    five - blank line
            # Thus, two three four and five will repeat several times for each 'one' line.
            # Each time, we will grab and append the contents to the variable of the appropriate name.
            #
            # Note that the model sequence begins and ends with a * character, which helps us
            # find the last group-of-three in our parsing.
            #
            # One complication: for some domain outputs, an extra 'RF' line is inserted above
            # the model sequence ('two') line in each group of three, making it a group of four.
            # This parser can handle that case; it ignores the line (doesn't return it).
            #
            one = line # get the current line
            if one.strip() == '//':
                # End-of-output marker found
                return (hits,domains2);
            
            two=''
            three=''
            four=''
            check=s.readline()
            # check for, and ignore, the extra 'RF' lines as they appear
            if check.strip() and check.split()[0] == 'RF':
                check=s.readline()              
            st=check.index('*')
            begin= check.strip()
            end=len(begin)
            if begin.strip() and begin.split()[0] == 'RF':
                begin= s.readline().strip()
            if begin.count('*') == 2:
                # Both the start and the end of the model sequence are on this line,
                # so the entire thing only consists of one three-line group.
                two = begin.strip()[3:-3] # model line starts with *-> and ends with <-*, ignore those chars
                three = s.readline().strip()
                four = s.readline().split()[2] # the seq line has flanking start and stop integers, we don't want those
                five = s.readline()
                if five.strip() != '':
                        raise "ParseError",'Stage 5: expected blank line'
            else:
                # This is the more usual case, the alignment is spread out over
                # >1 three-line group.
                flag=1
                read =1
                while (flag==1):
                    two = two + begin.strip() # append current 'two' line
                    temp=s.readline() # this is 'three'
                    if temp == None:
                        # Yikes! - hit end of file parsing, shouldn't
                        # happen.
                        raise "ParseError",'unexpected end of PFam result'
                    temp = temp[st:st+end] # read only the characters from * to the end
                    three += temp # append current 'three' line
                    four += s.readline().split()[2] # see note above about flanking integers
                    five = s.readline()
                    if five.strip() != '':
                        raise "ParseError",'Stage 5: expected blank line'
                    if begin.strip()[-1] == '*':
                        # We hit the end of the model sequence
                        flag=0
                    if read != 0:
                        begin= s.readline().strip()
                        if begin.strip() and begin.split()[0] == 'RF':
                            # skip the RF line
                            begin= s.readline().strip()
                    if begin.strip()[-1]=='*':
                        read=0
                three=three[3:-1]  # append the last bit we found
                two=two[3:-3]
            # We are done parsing this domain alignment, save the results and go to next domain
            domains2.append(domains[0]+(two,three,four))
            domains = domains[1:]
    # All done!
    return (hits, domains2)
# 'RF' example: resultid 14693470
-------------- next part --------------
hmmpfam - search one or more sequences against HMM database
HMMER 2.2g (August 2001)
Copyright (C) 1992-2001 HHMI/Washington University School of Medicine
Freely distributed under the GNU General Public License (GPL)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HMM file:                 /usr/local/biotools/lib/Pfam
Sequence file:            query
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Query sequence: gi|24649473|ref|NP_651199.2|
Accession:      [none]
Description:    [none]
Scores for sequence family classification (score includes all domains):
Model           Description                             Score    E-value  N 
--------        -----------                             -----    ------- ---
SM00045         Diacylglycerol kinase accessory domai   262.2    7.8e-76   1
PF00609         Diacylglycerol kinase accessory domai   237.0    2.9e-68   1
PF00781         Diacylglycerol kinase catalytic domai   167.8      2e-47   1
SM00046         Diacylglycerol kinase catalytic domai   166.6    4.6e-47   1
SM00109         Protein kinase C conserved region 1 (   149.8    5.3e-42   3
PF00130         Phorbol esters/diacylglycerol binding   142.6    7.6e-40   3
PF00788         Ras association (RalGDS/AF-6) domain     60.5      4e-15   1
SM00314         Ras association (RalGDS/AF-6) domain     58.4    1.8e-14   2
PF03107         DC1 domain                               25.0    0.00019   2
COG1597         Predicted kinase related to diacylgly   -73.3     0.0053   1
SM00360         RNA recognition motif                    11.0     0.0059   1
PF00076         RNA recognition motif. (a.k.a. RRM, R    11.2      0.015   1
PF00628         PHD-finger                               -9.1       0.09   1
SM00249         PHD zinc finger                           6.2       0.18   1
SM00184         Ring finger                              -0.8       0.47   1
PF01500         Keratin, high sulfur B2 protein         -83.9        1.1   1
PF04928         Poly(A) polymerase central domain       -47.5        1.2   1
SM00361         RNA recognition motif                    -4.9        1.3   1
COG0284         Orotidine-5'-phosphate decarboxylase,  -107.0        2.3   1
PF02376         CUT domain                              -42.8        3.1   1
PF00412         LIM domain                              -24.0        3.1   1
SM00215         von Willebrand factor (vWF) type C do   -22.0        3.9   1
SM00217         Four-disulfide core domains             -21.1          4   1
PF04041         Domain of unknown function (DUF377)    -168.6        4.9   1
PF04014         SpoVT / AbrB like domain                -14.1        5.1   1
PF03768         Attacin, N-terminal region              -14.7        5.2   1
PF00219         Insulin-like growth factor binding pr   -23.6        5.4   1
PF01021         TYA transposon protein                 -296.2        5.6   1
PF03154         Atrophin-1 family                      -731.6        5.8   1
SM00343         zinc finger C2HC, DNA-binding            -4.4        6.3   1
PF04236         Tc5 transposase C-terminal domain       -23.8        6.3   1
SM00336         B-Box-type zinc finger, protein inter   -18.9        6.4   1
PF03208         Prenylated rab acceptor (PRA1)          -89.5        6.5   1
PF00503         G-protein alpha subunit                -308.3        6.9   1
COG0008         Glutamyl- and glutaminyl-tRNA synthet  -325.3        7.1   1
PF03792         PBX domain                             -116.2        7.7   1
PF03302         Giardia variant-specific surface prot  -292.4        8.7   1
PF04395         Poxvirus B22R protein                  -557.5        8.9   1
PF04396         Protein of unknown function, DUF537     -40.4          9   1
SM00463         Small MutS-related domain               -20.0        9.4   1
SM00157         Major prion protein                     -87.0        9.5   1
PF01391         Collagen triple helix repeat (20 copi   -86.6        9.5   1
Parsed for domains:
Model           Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
--------        ------- ----- -----    ----- -----      -----  -------
PF00130           1/3       6    55 ..     1    51 []    74.6  2.3e-19
SM00109           1/3       6    55 ..     1    61 []    68.3  1.8e-17
PF03107           1/2      17    48 ..     1    44 []     4.9     0.39
PF01500           1/1      32   183 ..     1   177 []   -83.9      1.1
PF03302           1/1      36   303 ..     1   412 []  -292.4      8.7
PF00412           1/1      36    81 ..     1    62 []   -24.0      3.1
PF00130           2/3      69   116 ..     1    51 []    13.1  0.00045
SM00109           2/3      69   116 ..     1    61 []    30.5  4.3e-06
SM00336           1/1      77   106 ..     1    51 []   -18.9      6.4
PF03107           2/2      80   109 ..     1    44 []    20.1   0.0057
PF00628           1/1      81   119 ..     1    51 []    -9.1     0.09
SM00249           1/1      81   152 ..     1    39 []     6.2     0.18
SM00184           1/1      82   151 ..     1    23 []    -0.8     0.47
PF04236           1/1      91   140 ..     1    69 []   -23.8      6.3
SM00215           1/1     100   166 ..     1   105 []   -22.0      3.9
PF00130           3/3     135   185 ..     1    51 []    54.9  1.9e-13
SM00109           3/3     135   185 ..     1    61 []    51.0  2.9e-12
SM00217           1/1     138   174 ..     1    51 []   -21.1        4
SM00343           1/1     147   162 ..     1    17 []    -4.4      6.3
PF00219           1/1     148   197 ..     1    84 []   -23.6      5.4
PF04928           1/1     183  1018 ..     1   205 []   -47.5      1.2
PF04395           1/1     189   835 ..     1  1361 []  -557.5      8.9
PF03154           1/1     256  1058 ..     1  1046 []  -731.6      5.8
PF01021           1/1     416   830 ..     1   440 []  -296.2      5.6
PF03768           1/1     469   519 ..     1    72 []   -14.7      5.2
SM00157           1/1     678   903 ..     1   221 []   -87.0      9.5
COG0008           1/1     697  1149 ..     1   592 []  -325.3      7.1
PF03792           1/1     706   876 ..     1   209 []  -116.2      7.7
PF01391           1/1     761   818 ..     1    60 []   -86.6      9.5
PF00503           1/1     819  1019 ..     1   362 []  -308.3      6.9
SM00314           1/2     831   922 ..     1   102 []     9.9    0.011
PF04041           1/1     887  1149 ..     1   363 []  -168.6      4.9
PF00788           1/1     923  1024 ..     1   113 []    60.5    4e-15
SM00314           2/2     923  1024 ..     1   102 []    48.5  1.7e-11
PF04396           1/1     937  1001 ..     1   115 []   -40.4        9
PF02376           1/1     969  1022 ..     1    88 []   -42.8      3.1
SM00361           1/1    1031  1103 ..     1    91 []    -4.9      1.3
SM00360           1/1    1031  1103 ..     1   121 []    11.0   0.0059
PF00076           1/1    1032  1102 ..     1    77 []    11.2    0.015
PF03208           1/1    1098  1256 ..     1   162 []   -89.5      6.5
COG1597           1/1    1117  1512 ..     1   332 []   -73.3   0.0053
PF00781           1/1    1119  1267 ..     1   154 []   167.8    2e-47
SM00046           1/1    1119  1267 ..     1   157 []   166.6  4.6e-47
COG0284           1/1    1333  1511 ..     1   266 []  -107.0      2.3
SM00045           1/1    1334  1489 ..     1   195 []   262.2  7.8e-76
PF00609           1/1    1334  1489 ..     1   190 []   237.0  2.9e-68
SM00463           1/1    1346  1416 ..     1    94 []   -20.0      9.4
PF04014           1/1    1447  1487 ..     1    47 []   -14.1      5.1
Alignments of top-scoring domains:
PF00130: domain 1 of 3, from 6 to 55: score 74.6, E = 2.3e-19
                   *->HrFkrttfyksptfCdhCgellwglakQGlkCsnCglnvHkrChekV
                      H F+ +tf ++pt+C hC +llwgl+ QG+ C++C++ +H+rC++ V
  gi|2464947     6    HSFVKKTF-HKPTYCHHCSDLLWGLIQQGYICEVCNFIIHERCVSSV 51   
                   ptnC<-*
                   +t+C   
  gi|2464947    52 VTPC    55   
SM00109: domain 1 of 3, from 6 to 55: score 68.3, E = 1.8e-17
                   *->Hkfvfrtf.kptfCdvCrksiwgsfkqaaksqglrCseCkvkcHkkC
                      H+fv++tf+kpt+C++C +++wg+ +q     g+ C++C++ +H++C
  gi|2464947     6    HSFVKKTFhKPTYCHHCSDLLWGLIQQ-----GYICEVCNFIIHERC 47   
                   aekvpaqshksglsC<-*
                   ++ v         +C   
  gi|2464947    48 VSSVVT-------PC    55   
PF03107: domain 1 of 2, from 17 to 48: score 4.9, E = 0.39
                   *->fsCdvCerkidpgsngffYsCskeegCndeeetsdyfvhdvrCa<-*
                        C  C   + +     +Y C +   Cn        f+++ rC+   
  gi|2464947    17    TYCHHCSDLLWG-LIQQGYICEV---CN--------FIIHERCV    48   
PF01500: domain 1 of 1, from 32 to 183: score -83.9, E = 1.1
                   *->qtScCGfptCStlgtrPsCGsscCQPsCCe...SCCQpsCcqpSCCq
                      q+  C    C        C+ss   P  C +   C        +   
  gi|2464947    32    QGYICEV--CNFII-HERCVSSVVTP--CSgiaPCIIKNPVAHCWSE 73   
                   PtcsqtscCqPtcfqs..........sCCrPsCcqTSCCq....PtCcqs
                   Pt     +C  +c    ++++  +   C     ++   Cq+   P C  +
  gi|2464947    74 PTHHKRKFCT-VCRKRldetpavhclVCEYFAHIE---CQdfavPDCTEN 119  
                   ssCqtgCgigGsiGyGQeGsSGAvScrirWCRPdCrvegtClPpCCvvsC
                       +g                 v  +  W R     ++t++   C  +C
  gi|2464947   120 ATYVPGKELL------------NVKHQHHW-R-EGNLPSTSKCAYCKKTC 155  
                   taPTCCqpvsaQasCCRPsCqPyCgqsCCRPaCccsvtCtrTccePc<-*
                   ++  C   + ++          +Cg       C+ +         P+   
  gi|2464947   156 WSSECLTGYRCE----------WCG-MTTHAGCRMY--------LPT    183  
PF03302: domain 1 of 1, from 36 to 303: score -292.4, E = 8.7
                   *->CaeCklGyelsadktkcetsaPPdCkveNCkaCsnekeeNevCeeCn
                      C+ C   +++     +c++s      v  C    +   +N v   C 
  gi|2464947    36    CEVCN--FII---HERCVSSV-----VTPCSGIAPCIIKNPV-AHCW 71   
                   SgfyLtpnTsqCidaCakiGnyYaqTnaqnKkiCkeCtvAnCktCedqGq
                   S     p T      C  +      T a     C  C+   + +C d + 
  gi|2464947    72 S----EP-THHKRKFCTVCRKRLDETPA---VHCLVCEYFAHIECQDFAV 113  
                   cqaCndGfYksGdaCsPChes..cKTCsgGTaSdCTeCltGkaLrYGnDg
                    +  +  +Y  G +       ++ +  + +  S+C  C            
  gi|2464947   114 PDCTENATYVPGKELLNVKHQhhWREGNLPSTSKCAYC------------ 151  
                   TKGtC.GegCttgtGaGPaCkTCGLtIDGtsYCSeCateteyPqNGVCtS
                    K tC    C tg      C+ CG t                        
  gi|2464947   152 -KKTCwSSECLTGY----RCEWCGMT------------------------ 172  
                   taaRatatCkdstvanGvCssCanGyl..rmnGGCYeTtKfPGKSVCeea
                       + a C                yl++  n G  +   +P  SV +  
  gi|2464947   173 ----THAGCR--------------MYLptECNFGILQPIYLPPHSVSIPR 204  
                   ngggDTCqkeapGYkldsgdLvvCSeGCktCtssTvCttCadGyvkdggs
                                                         +    +vk+  s
  gi|2464947   205 TEVP--------------------------------IEAIIGVQVKSKTS 222  
                   dv....CtkCDssCeTCTaGatttCktCaTGYYKsgtgcvkCtssesdSn
                    v++ +C+  D sC    aG+             + +g     +   + +
  gi|2464947   223 LVrdysCPSPDLSCPIPGAGSGSL----------TSLGLKELLELHRQ-R 261  
                   gitGVkgClsCAPP.snnkGSV.lCYLikdss.sGGnSTNKSGLSTGAIA
                           l   PP++ + GS++lC     ss + G   N          
  gi|2464947   262 LEQSKQHFLLSTPPtPTSCGSIsLCHSPTPSSlTVGETSN---------- 301  
                   GIsVAvviVVGGLVGFLCWWFiCRGKA<-*
                                             A   
  gi|2464947   302 -------------------------EA    303  
PF00412: domain 1 of 1, from 36 to 81: score -24.0, E = 3.1
                   *->CagCnkpIydrevvrralnkvwHpeCFrCavCgkpLtegdefyekdg
                      C  Cn  I++r+v+  +                +p++    + +k+ 
  gi|2464947    36    CEVCNFIIHERCVS-SVV---------------TPCSGIAPCIIKNP 66   
                   skelYC..khDyyklfg<-*
                        C +++ ++k+++   
  gi|2464947    67 -V-AHCwsEPTHHKRKF    81   
PF00130: domain 2 of 3, from 69 to 116: score 13.1, E = 0.00045
                   *->HrFkrttfyksptfCdhCgellwglakQGlkCsnCglnvHkrChekV
                      H++   t  ++ +fC +C+++l  ++   ++C +C + +H +C+ ++
  gi|2464947    69    HCWSEPTH-HKRKFCTVCRKRLDETP--AVHCLVCEYFAHIECQDFA 112  
                   ptnC<-*
                   +++C   
  gi|2464947   113 VPDC    116  
SM00109: domain 2 of 3, from 69 to 116: score 30.5, E = 4.3e-06
                   *->Hkfvfrtf.kptfCdvCrksiwgsfkqaaksqglrCseCkvkcHkkC
                      H ++++t++k++fC vCrk++   +       ++ C +C + +H  C
  gi|2464947    69    HCWSEPTHhKRKFCTVCRKRLDETP-------AVHCLVCEYFAHIEC 108  
                   aekvpaqshksglsC<-*
                   ++++ +       +C   
  gi|2464947   109 QDFAVP-------DC    116  
SM00336: domain 1 of 1, from 77 to 106: score -18.9, E = 6.4
                   *->eraplCeeHgd..eepaeffCveedgallCrdCdeageHqanklfrg
                      +++  C++++++ +e+  + C         ++C+ +  H        
  gi|2464947    77    HKRKFCTVCRKrlDETPAVHC---------LVCEYF-AH-------- 105  
                   Hrvvll<-*
                        +   
  gi|2464947   106 -----I    106  
PF03107: domain 2 of 2, from 80 to 109: score 20.1, E = 0.0057
                   *->fsCdvCerkidpgsngffYsCskeegCndeeetsdyfvhdvrCa<-*
                      + C vC++++d+     + +C +   C+       yf h+ +C    
  gi|2464947    80    KFCTVCRKRLDE---TPAVHCLV---CE-------YFAHI-ECQ    109  
PF00628: domain 1 of 1, from 81 to 119: score -9.1, E = 0.09
                   *->yCsvCgkvdddaggdllqCDgCdrwfHlaClgppleeppegkWlCpe
                      +C vC+k  d  +   + C  C+   H++C ++  +           
  gi|2464947    81    FCTVCRKRLD--ETPAVHCLVCEYFAHIECQDFAVP----------D 115  
                   Ctpk<-*
                   Ct+    
  gi|2464947   116 CTEN    119  
SM00249: domain 1 of 1, from 81 to 152: score 6.2, E = 0.18
                   *->yC.vCgk....g.llqCdkgCdrwyHv.Clgpple............
                      +C+vC+k+ ++ + + C   C+   H++C +++ ++ +++ +  +++
  gi|2464947    81    FCtVCRKrldeTpAVHCL-VCEYFAHIeCQDFAVPdctenatyvpgk 126  
                   ..............epdg.wyCprCk<-*
                   +  + +++++ ++++ +++ +C  Ck   
  gi|2464947   127 ellnvkhqhhwregNLPStSKCAYCK    152  
SM00184: domain 1 of 1, from 82 to 151: score -0.8, E = 0.47
                   *->CpICle.......pvvlpCgH.FCr.Ci...................
                      C++C  + ++++ + +l C    + +C +   ++ +++ +  ++++ 
  gi|2464947    82    CTVCRKrldetpaVHCLVCEYfAHIeCQdfavpdctenatyvpgkel 128  
                   ...................CPlC<-*
                    + ++++++++++ +++++C  C   
  gi|2464947   129 lnvkhqhhwregnlpstskCAYC    151  
PF04236: domain 1 of 1, from 91 to 140: score -23.8, E = 6.3
                   *->dshdeFlTPsqYCfgvsGhvdtviCyftgCqnlaFIrCARCKkfPar
                            +TP+ +C+     ++     +  Cq++a   C     +   
  gi|2464947    91    ------ETPAVHCLV----CE--YFAHIECQDFAVPDCTENATY--V 123  
                   tGknfiCFnHfVvsefhacpcp<-*
                    Gk++       v  +h  +     
  gi|2464947   124 PGKEL-----LNVKHQHHWREG    140  
SM00215: domain 1 of 1, from 100 to 166: score -22.0, E = 3.9
                   *->CqNAvnnGsyYppLNkGakWdDiALtGRtEDtdDCsnrCtClnGrvs
                      C        y++++  +++++            DC+ + t   G+ +
  gi|2464947   100    C-------EYFAHIECQDFAV-----------PDCTENATYVPGKEL 128  
                   lCtkvwCgpkpClLhgslsKSSnlsgeCplgqgcvpslsdqKqYtvHGDC
                   l  k +  +++    g l+    +++ C+    + +s s+         C
  gi|2464947   129 LNVKHQHHWRE----GNLP----STSKCAYCKKTCWS-SE---------C 160  
                   fsvltsP.C<-*
                      lt ++C   
  gi|2464947   161 ---LTGYrC    166  
PF00130: domain 3 of 3, from 135 to 185: score 54.9, E = 1.9e-13
                   *->HrFkrttfyksptfCdhCgellwgla.kQGlkCsnCglnvHkrChek
                      H+++  ++ +s+  C++C++ +w     +G++C++Cg++ H+ C + 
  gi|2464947   135    HHWREGNL-PSTSKCAYCKKTCWSSEcLTGYRCEWCGMTTHAGCRMY 180  
                   VptnC<-*
                   +pt+C   
  gi|2464947   181 LPTEC    185  
SM00109: domain 3 of 3, from 135 to 185: score 51.0, E = 2.9e-12
                   *->Hkfvfrtf.kptfCdvCrksiwgsfkqaaksqglrCseCkvkcHkkC
                      H++++++ ++ ++C++C+k++w+s   +    g+rC++C+++ H  C
  gi|2464947   135    HHWREGNLpSTSKCAYCKKTCWSSECLT----GYRCEWCGMTTHAGC 177  
                   aekvpaqshksglsC<-*
                      +p         C   
  gi|2464947   178 RMYLPT-------EC    185  
SM00217: domain 1 of 1, from 138 to 174: score -21.1, E = 4
                   *->KpGsCPwvqlpiiasCplgnppnkCssDsqCpGnkKCCengCGKksC
                      ++G  P      +++   + ++  C+s s+C   + C    CG    
  gi|2464947   138    REGNLP-----STSK--CAYCKKTCWS-SECLTGYRC--EWCG---- 170  
                   ltPv<-*
                   +t     
  gi|2464947   171 MTTH    174  
SM00343: domain 1 of 1, from 147 to 162: score -4.4, E = 6.3
                   *->kCynCGkeGHiardCpk<-*
                      kC +C+k+  ++++C     
  gi|2464947   147    KCAYCKKTC-WSSECLT    162  
PF00219: domain 1 of 1, from 148 to 197: score -23.6, E = 5.4
                   *->CprPcGGpCpaerlarCpPgPpvaPpaecaelvredGCGCClvCArq
                      C+      C       C  +       ec  + r      C+ C   
  gi|2464947   148    CA-----YCKKT----CWSS-------ECLTGYR------CEWCGMT 172  
                   eGeaCGvytPrDeskGLyCarGaedaakaLrCrpppG<-*
                     + C  y+P       +C++G      +L+   +p    
  gi|2464947   173 THAGCRMYLPT------ECNFG------ILQPIYLPP    197  
PF04928: domain 1 of 1, from 183 to 1018: score -47.5, E = 1.2
                RF    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx          
                   *->stkqyGvtkpislagpkekdvkltesLieeLkefgsf..........
                      ++ ++G+ +pi l+ p++  + +te  ie++  +++ ++++  ++ +
  gi|2464947   183    TECNFGILQPIYLP-PHSVSIPRTEVPIEAIIGVQVKsktslvrdys 228  
                RF                                                   
                   ..................................................
                    ++++ + + ++ ++++ ++ + ++  + ++++ ++++++   +++++++
  gi|2464947   229 cpspdlscpipgagsgsltslglkellelhrqrleqskqhfllstpptpt 278  
                RF                                                   
                   ..................................................
                   + ++ +  +++++++ + ++++++ ++++++++++++++++++++++++ 
  gi|2464947   279 scgsislchsptpssltvgetsneaeqdrerdqdqpeeepeeenteqdsa 328  
                RF                                                   
                   ..................................................
                    + ++++++  ++ ++ ++ +++ +   ++  ++ ++++++++++ ++++
  gi|2464947   329 lqlttstsnvignlqkwpsansslhllytnlfrklgqgkrrrkrgissgg 378  
                RF                                                   
                   ..................................................
                    ++++++++ +++  + ++++ +++ ++ +   ++++ +++++++ ++++
  gi|2464947   379 lspsededdvdggvcdisggdlsddydhcdvalrrrslrsrqprdvsetd 428  
                RF                                                   
                   ..................................................
                    +++ + + ++++ ++++  ++++++++ +++++ +++ +  ++ + +++
  gi|2464947   429 yhgdaeaeaegetvprescyetsdtggeltntddldsslnlisnlsynss 478  
                RF                                                   
                   ..................................................
                   ++++  + +++ + ++ +++ +++++ ++++++  + ++++++++++   
  gi|2464947   479 nnsnacnvpggatapdarntattsttapgksghalsvqggrqqpktgala 528  
                RF                                                   
                   ..................................................
                   + ++++++   ++++ ++++++ +++ +++++++ ++ +++ + +  + +
  gi|2464947   529 qikpkpkpilmpkhkaqgkggslssplsnsnssdcssaspsapatllqls 578  
                RF                                                   
                   ..................................................
                   + +++++ +++   +  ++ ++ +++  +++++++++++  + +++++ +
  gi|2464947   579 pvgrsksfqesaaitavsrykkygrglfqrrrskrspknavgvggksnys 628  
                RF                                                   
                   ..................................................
                    ++ +++ + + ++++++ ++ +++ +   ++ + ++ +++ + ++   +
  gi|2464947   629 ldrlsqnieitiqdedgnfhpyddnyhmlagrldatdvdddvgfddlyld 678  
                RF                                                   
                   ..................................................
                   +++++ +++    ++ ++++ +++++ ++ ++++  ++  ++ +++ + +
  gi|2464947   679 drpsgasddvafagdisdggassrsrasdasdghvlgrllrqvrqglsvg 728  
                RF                                                   
                   ..................................................
                    ++++ ++++ ++ +++ +++++++ +++++ ++ ++++++++++ ++++
  gi|2464947   729 wrkpryqkrrarsiseefssgdtprfkdeesaskaesghgpssggagggg 778  
                RF                                                   
                   ..................................................
                   ++++ ++++  + + +  ++++++ +++++++++++++++++++++++++
  gi|2464947   779 gsggaggssaagasasaaggssghyrpdsgsghksdksekdrekkekere 828  
                RF                        xxxxxxxxxxxxxxxxxxxxxxxxxxx
                   .......................eilkLVPnkyevFrltLraiKlWAkrr
                   +++ +  +  +++++ ++++ +  i+++     +  +++Lra  +    +
  gi|2464947   829 ekdiemikvfdgnnsfrrqqyrvIIVQRTYTLEQLLTTALRAFHITRDPQ 878  
                RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                   giYsNvlGFlgGvaWAiLVAriCQlYPnavpstlvekfFlvfsqWlrhnw
                   + Y   l  + G     +   +    P  +   l  k   ++ +   h++
  gi|2464947   879 AFYLTDLYAPAGMEDTPMLDPT----PVLNLVHLEGKRPAIYLRF--HDR 922  
                RF xxx   xxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxx
                   pnP...VlLkeinsdsieernlqvrvWdprknk...sDricyhlmPiiTP
                      + +V+  ++++  +e+   +v+v ++ + k+  +D+  ++ +     
  gi|2464947   923 DRGhvrVYPGKLQCSMLEDPYVSVPVDNSTVIKdliRDA--LDKFGLQDN 970  
                RF xx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx  
                   Ay.PqqnstynVsestlkvileefkrgleildeielgkaeWskLfeka<-
                     + + +s     +   + il+  +r  +i++++ + ++  ++L+  +  
  gi|2464947   971 QIqDYRCSEVLLDRGVTERILSWNERPWDIMKQLGKDSIRQMELMRFY   1018 
                RF  
                   *
                    
  gi|2464947     -   -    
PF04395: domain 1 of 1, from 189 to 835: score -557.5, E = 8.9
                   *->lfsltilaIyiliteSegyetClRKtplYHdtqkkiepKentDhkAs
                      +++ ++l+    +++S               +  + +p e +     
  gi|2464947   189    ILQPIYLP---PHSVS---------------IPRTEVPIEAI----- 212  
                   AtykYLsiaekkEkerflesFnWtkIkeeVKdaFirkCdlssnkdRLdgv
                                            I   VK+        +       + 
  gi|2464947   213 -------------------------IGVQVKS--------K-------TS 222  
                   ykYNYtiaysltVskksekktkgtdiestykkitknivastlslskvdee
                   +  +Y+   s + s   + +   ++ +s ++k + ++  + l+ sk    
  gi|2464947   223 LVRDYS-CPSPDLSCPIP-GAGSGSLTSLGLKELLELHRQRLEQSK---- 266  
                   yTFttiiyaTvtssleTsSvPiddrSsdyvntiaikiLikvLdVNETele
                                                                     
  gi|2464947     - -------------------------------------------------- -    
                   aylisnESLimAkyinttknkdskvdfnlPkvehitYenskCnNiTvdkV
                                              ++P+  +   ++s C   T +  
  gi|2464947   267 ----------------------QHFLLSTPPTPTSCGSISLCHSPTPSSL 294  
                   tIGnFSvidvdsaenakedIrIiFkGvStsdPYvdSDkfieCitkkInnc
                   t+G     +++ ae+  e                d D+ +e   +  ++ 
  gi|2464947   295 TVGE----TSNEAEQDRE---------------RDQDQPEE---EPEEEN 322  
                   knsndvkgkvkveKsvTsNCekCsMgLMaeVtsvPeEFnnTLKenGikdD
                   ++  +     + + +v++N                      L++     +
  gi|2464947   323 TEQDSALQLTTSTSNVIGN----------------------LQKWPSANS 350  
                   dlteLYNFYlCmltnnddCseYvpLtekikedtlksLssYsliktsrsRr
                    l  L                Y +L +k               ++ + Rr
  gi|2464947   351 SLHLL----------------YTNLFRK--------------LGQGKRRR 370  
                   KsRPRRnAGDsRDtdeeteiS.sEdLe......CmYlsYdtddDDDredd
                   K          R +   + +S+sEd ++ +++ C     d +dD      
  gi|2464947   371 K----------RGIS-SGGLSpSEDEDdvdggvCDISGGDLSDD------ 403  
                   drydqCvnspekeItaKsRkkRsdseeknekRKqsYKnRPKRsLdddltd
                     yd+C           sR +R  se + ++             d    +
  gi|2464947   404 --YDHC-DVALRRRSLRSRQPRDVSETDYHG-------------DAEAEA 437  
                   ylKKyLgie.eVIPkkAsHlQVGistsYgkseedgViGDs.....sIysd
                           e+e  P++  +      ts  ++e ++   D+ +++  ++s+
  gi|2464947   438 --------EgETVPRESCY-----ETSDTGGELTNT--DDldsslNLISN 472  
                   vKdrAkkllekimPsvPldTdpeslyakirkptkikLPpdsKnivtealr
                   ++  +++   + + +vP  ++  + + +  ++t                 
  gi|2464947   473 LSYNSSN--NSNACNVPGGATAPDARNTATTSTT---------------- 504  
                   siieqKqeSvkevLkteselssssieeaetegkskhkssveteivvLskd
                                                 +gks h +sv          
  gi|2464947   505 -----------------------------APGKSGHALSVQGG------- 518  
                   DldvkenysrkglvsriddepvyedirsvdrlkekirdyrkkGGkkkess
                            r   +++          ++ + +k  +    k +Gk    +
  gi|2464947   519 ---------RQQPKTG-------ALAQIKPKPKPILMPKHKAQGK----G 548  
                   isvlkevsrtssgmfdvDtStvvvkPsrrkitsasrnfessskpsrrlss
                    s+       ss+++++++S+   + +  + t  +    + sk+  ++  
  gi|2464947   549 GSL-------SSPLSNSNSSDCSSASPSAPATLLQLSPVGRSKSFQESAA 591  
                   develeyeknyrdSlepekssssrkrCkrglnkAvcaiLgrvplpeknnn
                    + ++ y k+y      +++   r+r kr+ + Av         +++  n
  gi|2464947   592 ITAVSRY-KKY------GRGLFQRRRSKRSPKNAV--------GVGGKSN 626  
                   dvvkdaravssvvdskrsSSaslySllPgvdtgeAAaagniardRqanaq
                                          ySl+   + +e                
  gi|2464947   627 -----------------------YSLDRLSQNIE---------------- 637  
                   venesitTPltRraaaaRrfqqGRvpdrgetnlvnelqklpls...tsqL
                      + i++   + +                 ++ +  + l ++ + t   
  gi|2464947   638 ---ITIQDEDGNFH-----------------PYDDNYHMLAGRldaTDVD 667  
                   snsvykeavqlstsgdesllqvpqRpsqsvvqgstPvrpsPPlpPardrl
                   ++ +  +++   ++++ s+  + +++ +    +s ++r s+         
  gi|2464947   668 DDVGFDDLYLDDRPSGASDDVAFAGDISD-GGASSRSRASDASDG----- 711  
                   rrPlaAiipedsipkskgipkvvsprlRrStsGvvcGMlQSkvksdgtYs
                                        v+++ lR    G + G    +        
  gi|2464947   712 --------------------HVLGRLLRQVRQGLSVGWRKPR-------- 733  
                   LvqlPiDGYPGnPArRPLPRIPiRsDssDssDHiYEtiGsRsRsYAGssG
                                                     Y    +Rs s   ssG
  gi|2464947   734 ----------------------------------YQKRRARSISEEFSSG 749  
                   .tHYnAiegSssdagsiessslesssgipkdkvvvgdrSgtssGGrrsGR
                   +t     e S s a s +++                  Sg+ +GG++sG 
  gi|2464947   750 dTPRFKDEESASKAESGHGP-----------------SSGGAGGGGGSGG 782  
                   rnsvrseSgySsddsevsmEGSVYqPSiKElnsksskkYkekMkkISsSf
                    ++ +      ++ s +              +  ss+ Y+   +  + + 
  gi|2464947   783 AGGSSA-----AGASASA-------------AGGSSGHYRPD-SGSGHKS 813  
                   DKsmaFglAmQligQqaInrqsRseriqkddrdkaEkvFEAVStsLSTiG
                   DKs                         + dr k Ek             
  gi|2464947   814 DKS-------------------------EKDREKKEK------------- 825  
                   ttmttAGIiaSPhLAfAGMGLSlISGLIDtGKDIYYlfSGkekPeDPlvK
                                                                     
  gi|2464947     - -------------------------------------------------- -    
                   kFNtYrelVsDtskmGVRKClmPGsDltIYlaYRNDSSFkPslEkLaLyF
                                                                     
  gi|2464947     - -------------------------------------------------- -    
                   iDtIdSvLYYLNTSnIIlDysLtVACPIGyLRSPdLDITAYTiLKFtTed
                                                                   e+
  gi|2464947   826 ----------------------------------------------EREE 829  
                   nVKFYqFtRLGAMLSKfPvVrLTCGrdiTLT<-*
                                            +di      
  gi|2464947   830 -------------------------KDIEMI    835  
PF03154: domain 1 of 1, from 256 to 1058: score -731.6, E = 5.8
                   *->ekhssRtfrargs..astlrsGRkkyPastdGvlSPvnedvrskGrn
                      e h  R  +++     st+       P s  G +S     +  +   
  gi|2464947   256    ELHRQRLEQSKQHflLSTPPT-----PTSC-GSIS-----LCHSPTP 291  
                   aaSavstssNdsK......aeavkksakkVkeeaASglknTKrqrekVas
                   +   v+  sN+  ++++++++ +   +     e  S l  T      ++ 
  gi|2464947   292 SSLTVGETSNEAEqdrerdQDQPEEEPEEENTEQDSALQLTTSTSNVIG- 340  
                   dtldsDRAaskkakfqevsRPNlPse.gEGEssdlRslNdesaSdPklid
                    +l      +      +    Nl    g G     R +   + S+  + d
  gi|2464947   341 -NLQK---WPSANSSLHLLYTNLFRKlGQGKRRRKRGISSGGLSPSEDED 386  
                   QdnRslsgslPSPqDnEsDsDyaaqQqMlqlqPgalkaPslAaSAPsslP
                    d   + g +       s  D +       +   al+  sl    P  + 
  gi|2464947   387 -D---VDGGVCDI----SGGDLSDDYDHCDV---ALRRRSLRSRQPRDVS 425  
                   PassslPaPGPtrfaysvssssSaAaSsssssssSsvaPaaasLiQalPs
                          a             +      s   +s +      +    l s
  gi|2464947   426 ETDYHGDAE--------AEAEGETVPRESCYETSDTGGELTNT--DDLDS 465  
                   lHPhrlPsPhtsLsvstaPPkytsAQPslPsqalhsQGPPgPhslqtGrL
                        l         s+   +y+                    s      
  gi|2464947   466 S----LN------LISNL--SYN--------------------SS----- 478  
                   LansnahPqPFGLtPq...SsqaQstlgPsPvaaHhHstiQlqasQsalQ
                     nsna   P G t ++ + +   st  P   + H  s         +  
  gi|2464947   479 -NNSNACNVPGGATAPdarNTATTSTTAP-GKSGHALSV--------QGG 518  
                   qQQhhrneqPlPPaalamPLEGGssHHikPyatsPsLGslrqlPagqAHk
                    QQ+         +ala           kP +                 +
  gi|2464947   519 RQQPKT-------GALAQI---------KPKPKPI--------------L 538  
                   hPPHLSqvSyfsanaNlPPvssalkslSSlStgsyPsaHPsPlQLgPQsa
                    P H         +  +   s+ ++s SS      Psa    lQ      
  gi|2464947   539 MPKH------KAQGKGGSLSSPLSNSNSSDCSSASPSAPATLLQ------ 576  
                   PlPfsPvqPtvlTsSasLstviatvASsPaGYKTasPPGlhqvgkraPfP
                       sPv   +   S   s  i t+ S     K +   Gl q  +   +P
  gi|2464947   577 ---LSPV---GRSKSFQESAAI-TAVSRYK--KYG--RGLFQRRRSKRSP 615  
                   GAyktavPgGykPisPPSFRtGtPPGYRtssPPAGPGtFKPGSssvqPGP
                      k av  G k       R          +     G F P         
  gi|2464947   616 ---KNAVGVGGKSNYSLD-RLSQNI---EITIQDEDGNFHPYDDNYHM-- 656  
                   lsaAvsSGlPslPPPPaAPasGpPLsAvQIKeEa.ldEaEePESPvPPaR
                           l        A      L A+ + +  + d             
  gi|2464947   657 --------L--------AGR----LDATDVDDDVgFDDL----------- 675  
                   SPSPePkVVDvPSHASQSARFyKHL.DRGyNSCAR.sDLYFvPLeGSKLA
                            D PS AS    F    +D G  S  R sD       G  L 
  gi|2464947   676 ------YLDDRPSGASDDVAFAGDIsDGGASSRSRaSDASDGHVLGRLLR 719  
                   KKRedlvEKvkREAEQkAREEkEREkEkEkEkEREREkERElERavkkAs
                     R  l    +    Qk R     E        R    E     ++ kA 
  gi|2464947   720 QVRQGLSVGWRKPRYQKRRARSISEEFSSGDTPRFKDEE-----SASKAE 764  
                   ssAHEGRAPledPsLsGPvhmRPsFEPgPsavAaVPPYlGPDTPALRTLS
                        G      Ps  G +    s   g s  A      G    A    S
  gi|2464947   765 ----SGH----GPSSGGAGGGGGSGGAGGSSAA------GASASAAGGSS 800  
                   EYARPHVMSPtNRNHPFYvPLnavDPGLLaYnvPaLYsvDPaiRERELRE
                      RP                   D G           +D     RE +E
  gi|2464947   801 GHYRP-------------------DSG-------SGHKSDKSEKDREKKE 824  
                   REiREREi............RERdLR....dRlKPGFEVKPsELdPLHgv
                   +E  E  i+  +  +++++ R    R     R           L   H  
  gi|2464947   825 KEREEKDIemikvfdgnnsfRRQQYRviivQRTYTLEQLLTTALRAFHIT 874  
                   tnPGldhFaRHsaLalqPGaaGlHPFasFHPs....LnPLERERLALAAG
                     P    F     L+    +aG    +   P +  +L  LE  R A    
  gi|2464947   875 RDP--QAFY----LTDLYAPAGMEDTPMLDPTpvlnLVHLEGKRPAIY-- 916  
                   PaLRPdMSYadRLAAERiHAERvAsLtsDPLARLQMlNVTPHHHQHSHIH
                     LR    + dR    R H                   V P   Q S   
  gi|2464947   917 --LR----FHDR---DRGH-----------------VRVYPGKLQCSML- 939  
                   SHLHLHQQDalHaaSAsPVHPLvDPLaaGsHLaRiPYPaGTLPNPLLgqP
                           D+    S       vD  +    L R           L    
  gi|2464947   940 -------EDP--YVSVP-----VDNSTVIKDLIR---------DALDKFG 966  
                   lHEnEvLRHqlFaaPYPRDLPaalsa....PMSAAHQLQAMHAQSAELQR
                   l  n +           R     +   +++P     QL        EL R
  gi|2464947   967 LQDNQIQDYRCSEVLLDRGVTERILSwnerPWDIMKQLGKDSIRQMELMR 1016 
                   LAlEQQqWLHa.HhhlHsvhLP...aQEDYYSrLKKEsDKqL<-*
                     +   q  H++   l  + LP++ +Q  Y   L K         
  gi|2464947  1017 FYMQHKQDPHGpNIALFVGNLPtglSQRNYEQILNKYVTDEN    1058 
PF01021: domain 1 of 1, from 416 to 830: score -296.2, E = 5.6
                   *->MESQQLsQnsri.lHGSAyASVTSKEVh..............sNQDP
                      + S Q    s  + HG A A      V++++  ++++++++ +N D 
  gi|2464947   416    LRSRQPRDVSETdYHGDAEAEAEGETVPrescyetsdtggelTNTDD 462  
                   LdVSASkleEfdkdSTKvNSQQeTTPasSAVPENhHHvSPQtAs......
                   Ld S   +      S   NS     P+    P     +   t  ++++++
  gi|2464947   463 LDSSLNLISNLSYNSSN-NSNACNVPGGATAPDARNTATTSTTApgksgh 511  
                   vhsPQNG.qYqQqgMMTqNkAnaSnWafYqqPSMityshYQ......tSP
                     s Q G+q +  g + q k  +       +P +++    Q+++++  SP
  gi|2464947   512 ALSVQGGrQQPKTGALAQIKPKP-------KPILMPKHKAQgkggslSSP 554  
                   a..YyqPdPqyqlPQYissvGtPLSTsSPdsidsftdsSevdsdeTkvkk
                    ++    d  +  P  +      L   SP       + S +  +  + kk
  gi|2464947   555 LsnSNSSDCSSASPSAPA----TLLQLSPVGRSKSFQESAAITAVSRYKK 600  
                   yVlPPhtLTSeedFstWVKfYIkFLkNSNLGdIIPtvnGkikRQiTddEl
                   y +    L          K  +     SN        n +i  Q  d   
  gi|2464947   601 YGR---GLFQRRRSKRSPKNAVGVGGKSNYSLDRLSQNIEITIQDEDGNF 647  
                   aylYNTFQiFAPfqlLPTWVKdILevdYaDIlkvLsKSveKMQsdtQElk
                         +   A  +l  T V d  +v + D           +      + 
  gi|2464947   648 HPYDDNYHMLA-GRLDATDVDD--DVGFDDLY---------LDDRPSGAS 685  
                   DivaLANLeYdGSTsADaFEikVstIIdRLkeNnInvsdklACQLIlkGL
                   D va A    dG  s        s           +v  +l  Q + +GL
  gi|2464947   686 DDVAFAGDISDGGASSRSRASDAS---------DGHVLGRLLRQ-VRQGL 725  
                   SGdyKyLRytrrrklNMklaeLFldIqlIYdEnkisrlsKPsyrknhSde
                   S  ++  Ry++rr             + I +E   s    P ++ + S  
  gi|2464947   726 SVGWRKPRYQKRRA------------RSISEE--FSSGDTPRFKDEESAS 761  
                   KNvSRsytNTTktKViaRNyQkTNsSKskaAkAHNvaTSskfsrvdNDsI
                   K  S   +                sS    A A     Ss   r d  s 
  gi|2464947   762 KAESGHGPSSGGAGGGGGSGGAGGSS-AAGASASAAGGSSGHYRPDSGSG 810  
                   skSTvesiyLsddndLsLrqetk<-*
                    kS  +      +       e k   
  gi|2464947   811 HKSDKSEKDREKKEK---EREEK    830  
PF03768: domain 1 of 1, from 469 to 519: score -14.7, E = 5.2
                   *->leGSltlNsdGgsdArlklkVplvGndknnvsaeVFAlGsvdlndqg
                      l   l+ Ns+  s A   + Vp  G           A      +d+ 
  gi|2464947   469    LISNLSYNSSNNSNA---CNVP--GG----------AT----APDAR 496  
                   kpvtaGaglAldNvnGHGLSLTkth<-*
                     +t++ +  + +  GH+LS+++++   
  gi|2464947   497 NTATTSTT--APGKSGHALSVQGGR    519  
SM00157: domain 1 of 1, from 678 to 903: score -87.0, E = 9.5
                   *->kkrPkPGGGWntGGsRYPGqgsPGGnrYPpqgggGgWGqPhGGgWGq
                          +P G   +    + G  s GG    +++   + G+  G    q
  gi|2464947   678    --DDRPSG--ASDDVAFAGDISDGGASSRSRASDASDGHVLGRLLRQ 720  
                   PHgG...gWGqPHgGgWGqPHGgggWgqGGGthnqWnkPsKPKtnlKH..
                      G + gW +P             ++ G  t    +  s  K +  H++
  gi|2464947   721 VRQGlsvGWRKPRYQKRRARSISEEFSSGD-TPRFKDEESASKAESGHgp 769  
                   vAGAAAAGAvvGGLGGYmLGsams..........rPliHFGndyED.RYY
                     G A  G   GG GG     a ++  ++++++ rP    G   + +   
  gi|2464947   770 SSGGAGGGGGSGGAGGSSAAGASAsaaggssghyRPDSGSGHKSDKsEKD 819  
                   rEnmyRYPnqvyYrPvDqYsnqnnfvHDCvnitvKqHtvttttKGEnFtE
                   rE   +  +      +  +   n+f      + + q t t     E    
  gi|2464947   820 REKKEKEREEKDIEMIKVFDGNNSFRRQQYRVIIVQRTYTL----EQLLT 865  
                   tDvKimErvveqmCitqYqkEsqAyyqRgasvvlfssPpv<-*
                   t ++     +       Y  +  A +      +l  +P+    
  gi|2464947   866 TALRAFH--ITRDPQAFYLTDLYAPAGMEDTPMLDPTPVL    903  
COG0008: domain 1 of 1, from 697 to 1149: score -325.3, E = 7.1
                   *->alvNAi.h.GKAn.kAVMGkvm.enpelRsma.ea.eiv.nfieqvn
                       ++   ++ ++A+++ V+G +++           ++ +++++     
  gi|2464947   697    GGASSRsRaSDASdGHVLGRLLrQ----VRQGlSVgWRKpRY----- 734  
                   smslmekk.lle.lype.................................
                         +k+++ ++++e +++++++ +++++ ++ ++++++++++ ++++
  gi|2464947   735 ------QKrRARsISEEfssgdtprfkdeesaskaesghgpssggagggg 778  
                   .................................LpelevmgkVrTRFAPS
                   ++++ ++++  + + +  ++++++ ++++++++  ++ ++++ +      
  gi|2464947   779 gsggaggssaagasasaaggssghyrpdsgsghKSDKSEKDREK------ 822  
                   PTGyLHIGgARtALfNylfARhygGkFiLRIEDTDpTeRstpea.eeaIl
                                                      + e++++e ++e+I+
  gi|2464947   823 -----------------------------------K-EKEREEKdIEMIK 836  
                   edLkWLGlnWDegpdvGGpYgpyyQSeRfdiYyeyaekLieeGkAYyCyc
                          ++   +         ++Q     +Y+  + +          + 
  gi|2464947   837 V------FDGNNS--------FRRQ-----QYRVIIVQR---------TY 858  
                   tpEELealRGtltregaeapgrdprYdgnlrlltkmeegeypageGeppv
                   t E+L     t++r+  + +     Y   l   + me+   ++     pv
  gi|2464947   859 TLEQLLT---TALRAFHITRDPQAFYLTDLYAPAGMEDTPMLDP---TPV 902  
                   vRfKvplegep.k.lnivfrDlvkGrIvfanad.....ilhDfvilRsDG
                   + + v leg +++     f+D  +G ++  ++  + + + ++ v      
  gi|2464947   903 LNL-VHLEG-KrPaIYLRFHDRDRGHVRVYPGKlqcsmLEDPYV------ 944  
                   yPTYnFAVVVDDhlMGITHViRGeDhlsNTprQillyeAlGwpvtwepPv
                          V+VD                ++T + +l+ +Al+         
  gi|2464947   945 ------SVPVD----------------NSTVIKDLIRDALDKFG------ 966  
                   faHlplilneglSKrklkkledgkKLSKRdgpRaptveayRrrGylPEAl
                         + +            + + +             + +rG++    
  gi|2464947   967 ------LQD-----------NQIQDYRCSEV--------LLDRGVT---- 987  
                   rNflallGvwspddddqEifsleelirkFdlervskspavfDpkKLewlN
                        ++   s+ + ++ +  + +l ++          ++  ++ ++++ 
  gi|2464947   988 ----ERIL--SW-N-ERPWDIMKQLGKD----------SIRQMELMRFYM 1019 
                   aeyikeelddeplhpllkpflphpeaGerelpftrelkkdidyidredle
                    +  k   +  p+  l++  l  p+                  +++ + e
  gi|2464947  1020 QH--KQ-DPHGPNIALFVGNL--PTG-----------------LSQRNYE 1047 
                   ellplvkerlktlkelrlltryffeapdvvedadedvakklfkeedkevL
                   ++l+         k + ++ +++   p    +++   +  l  e+  + +
  gi|2464947  1048 QILN---------KYVTDENKFISIGP---IYYEYGSV-VLTFEDSMKAV 1084 
                   eklkekLeklkgvihWtpeeie.aikvrlaeelglKgkklfmplRvalTG
                   +++++ L+++   i +++   +  +   l++ +   +  ++  +R  l+ 
  gi|2464947  1085 RAFYN-LRET---IIEDK---KlLVL-LLPNIE---PSMVPSDVRPLLVF 1123 
                   saegpelfetiellGkeeqleRlgyalad<-*
                     ++++  + +el++   ++  l + ++    
  gi|2464947  1124 VNVKSGGCQGLELIS---SFRKLLNPYQV    1149 
PF03792: domain 1 of 1, from 706 to 876: score -116.2, E = 7.7
                   *->segtvrhdkrkdIgdlLqevlkItdqtLDeeqvNakKhqLkchpmkr
                      s ++ +      +g lL +v     q L    v  +K+  +  + + 
  gi|2464947   706    SDASDG----HVLGRLLRQV----RQGL---SVGWRKPRYQKRRARS 741  
                   AlfdVLcEiKeKtvLSvrnmkdeeppdPqlmRLDnMLvAEGVAGPdkGG.
                          E  +  +  +   kdee             +AE   GP  GG 
  gi|2464947   742 -----ISEEFSSGD--TPRFKDEESAS----------KAESGHGPSSGGa 774  
                   .........GaAAsllaaqasgGtSlsidGaDsalehsdYRqkLlqiRri
                   ++++++++ G+  ++ a+++++G+S     +Ds   h   ++   + R  
  gi|2464947   775 gggggsggaGGSSAAGASASAAGGSSGHYRPDSGSGHKSDKS--EKDREK 822  
                   yenElkkYekaCneFtehVenlLreQSrtRPItqkeiErmvniisrKFns
                    e+E +  +    +      +  r+Q r            v i++r +  
  gi|2464947   823 KEKEREEKDIEMIKVFDGNNSFRRQQYR------------VIIVQRTYT- 859  
                   iqvqLKQstCEaVmiLrsRFLD<-*
                      qL  ++  a  i r    D   
  gi|2464947   860 -LEQLLTTALRAFHITR----D    876  
PF01391: domain 1 of 1, from 761 to 818: score -86.6, E = 9.5
                   *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp
                        + + G     G +G  G  G aG+   +G++ ++ + G+ G+  p
  gi|2464947   761    -SKAESGHGPSSGGAGGGGGSGGAGGSSAAGAS-ASAAGGSSGHYRP 805  
                   pGppGapGapGpp<-*
                       G + ++ +    
  gi|2464947   806 DSGSGHKSDKSEK    818  
PF00503: domain 1 of 1, from 819 to 1019: score -308.3, E = 6.9
                   *->seeekeqakrnkeIekqLkqekkkakrevKLLLLGAGESGKSTIlKQ
                        e+ke+++  k+Ie     ++    r                    
  gi|2464947   819    DREKKEKEREEKDIEMIKVFDGNNSFR-------------------- 845  
                   MKIIHgnGFSqEEkkeyrpvIyqNivqsmrvlvdAmetLgIpfgdperea
                                + +yr +I q ++   ++l  A  +++I+ +++    
  gi|2464947   846 -------------RQQYRVIIVQRTYTLEQLLTTALRAFHITRDPQ---- 878  
                   seadavmiletaketeeveeplpkeyadaikaLWkDpGiqecfdRsrEfq
                       a ++ + +++++   e  p+    ++  L   +G + +++ +    
  gi|2464947   879 ----AFYLTDLYAPAG--MEDTPMLDPTPVLNLVHLEGKRPAIYLR---- 918  
                   LnDSAkYFLdnldRisdpdYiPTeQDILrsRvkTTGIvEtkFsvkkltFR
                           + + dR + + Y                        +kl+  
  gi|2464947   919 --------FHDRDRGHVRVY-----------------------PGKLQ-- 935  
                   MfDVGGQRSERKKWIHCFEdVTAIIFlVALSEYDQvLfEDettNRMqESL
                                                +S     + ED+ +       
  gi|2464947   936 -----------------------------CS-----MLEDPYV------- 944  
                   kLFdsIcNnrwFvntSiILFLNKkDLFeEKIkktpssisd.yFPeYedys
                       s+       n+ +I     kDL ++ + k    ++d+   +Y+  +
  gi|2464947   945 ----SVPVD----NSTVI-----KDLIRDALDKFG--LQDnQIQDYR--C 977  
                   .....sGppqdyeaAkeFIkkkFvslnrnnekpkKeIYsHfTCATDTnnI
                   ++     +          + ++ +s n+++ +                  
  gi|2464947   978 sevllDRG----------VTERILSWNERPWD------------------ 999  
                   rfVFdaV.kDiIlqenLkecGL<-*
                     +++  +kD I q+ L+   +   
  gi|2464947  1000 --IMKQLgKDSIRQMELMRFYM    1019 
SM00314: domain 1 of 2, from 831 to 922: score 9.9, E = 0.011
                   *->dtyvlrVyvddlsavdpgqtyktlrvskrtTardViqqllekfhltd
                      d + ++V+++ +s   + q+y+++ v  + T ++++  +l+ fh+t 
  gi|2464947   831    DIEMIKVFDGNNSF--RRQQYRVIIVQRTYTLEQLLTTALRAFHITR 875  
                   edpeeYvLvevlp.sggkErvLlddenPlqlqklwprdaksprqsslrFv
                    dp+ + L   + + g ++  +ld + +l l+ l +++        + ++
  gi|2464947   876 -DPQAFYLTDLYApAGMEDTPMLDPTPVLNLVHLEGKR--------PAIY 916  
                   Lrkrdd<-*
                   Lr +d+   
  gi|2464947   917 LRFHDR    922  
PF04041: domain 1 of 1, from 887 to 1149: score -168.6, E = 4.9
                   *->elrKiptipileRpsyitGfdsriennPiiGRgpvrkpvavFNPavv
                       +  ++  p l  p+    ++  +                     v+
  gi|2464947   887    APAGMEDTPMLD-PT----PVLNL---------------------VH 907  
                   lyegeLrVYaRfVmlYrayvediatfrIgLadssdGRCSeinfkkepepv
                   l +++  +Y Rf+   r+ v  + ++ +    ++d+    ++ +     v
  gi|2464947   908 LEGKRPAIYLRFHDRDRGHV-RVYPGKLQCSMLEDP---YVSVPVDNSTV 953  
                   v..lPedkwElwGPsYvEDPRvvkigkryymTYTGydgkyarlcvattkn
                   +++l +d  +++G     D     i +        y    + l+ ++t  
  gi|2464947   954 IkdLIRDALDKFG---LQD---NQIQD--------YRCSEVLLDRGVTER 989  
                   lltwarlgNGeWvkfaefelnedrislwtksgaifPvKinGkyvmyfris
                    l+w   +   W +               k+     + ++    m ++  
  gi|2464947   990 ILSWNERP---WDIMKQLG----------KDSIRQMELMR--FYMQHK-- 1022 
                   DnvHdldsniwLavSnvddlvhWenerepSYidvgsprpgmfdapFElKi
                    + H +++  +  v+n + ++  +n+          + ++ +++     i
  gi|2464947  1023 QDPHGPNIALF--VGNLPTGLSQRNYEQI------LNKYVTDEN---KFI 1061 
                   GwgtPPveteeGwSLLVLiHGvNvaGrytenlvYRvGaaLlDlegRPskv
                     g+  +  e+G    VL      +     +  Y +   +       +++
  gi|2464947  1062 SIGP--IYYEYGS--VVLTF---EDSMKAVRAFYNLRETIIEDKK-LLVL 1103 
                   lartpeYILePeeewEvyGdvpnVVFPcgalvdegtgrvliyYGaADtav
                   l+   e+ ++P+ +      +p  VF     v  g             + 
  gi|2464947  1104 LLPNIEPSMVPSDV------RPLLVF---VNVKSG------------GCQ 1132 
                   GLAeipGdleelmnflke<-*
                   GL  i+   ++l+n       
  gi|2464947  1133 GLELIS-SFRKLLNPYQV    1149 
PF00788: domain 1 of 1, from 923 to 1024: score 60.5, E = 4e-15
                   *->dqgvlrvyfqdllsvtpgvayKtirvssedtapdViqeaLeKfrldd
                      d g++rvy++ l      ++y ++ v +++ + d+i++aL Kf+l+d
  gi|2464947   923    DRGHVRVYPGKLQCSMLEDPYVSVPVDNSTVIKDLIRDALDKFGLQD 969  
                   RMedpeeYaLvevlltregalesggkerkLpddenPlqlrlnlprddrrs
                     + +++Y   evl      l++g++er+L+ +e+P  ++++l++d   s
  gi|2464947   970 --NQIQDYRCSEVL------LDRGVTERILSWNERPWDIMKQLGKD---S 1008 
                   vrqqsslrFlLkrrdd<-*
                   +rq    rF+++ ++d   
  gi|2464947  1009 IRQMELMRFYMQHKQD    1024 
SM00314: domain 2 of 2, from 923 to 1024: score 48.5, E = 1.7e-11
                   *->dtyvlrVyvddlsavdpgqtyktlrvskrtTardViqqllekfhltd
                      d +++rVy++ l      ++y+++ v++ t  +d+i+++l kf+l+d
  gi|2464947   923    DRGHVRVYPGKLQCSMLEDPYVSVPVDNSTVIKDLIRDALDKFGLQD 969  
                   edpeeYvLvevlp.sggkErvLlddenPlqlqklwprdaks.prqsslrF
                   ++ ++Y   evl+++g++Er+L  +e+P  ++k +++d  s ++ + +rF
  gi|2464947   970 NQIQDYRCSEVLLdRGVTERILSWNERPWDIMKQLGKD--SiRQMELMRF 1017 
                   vLrkrdd<-*
                   +++++ d   
  gi|2464947  1018 YMQHKQD    1024 
PF04396: domain 1 of 1, from 937 to 1001: score -40.4, E = 9
                   *->eesaeakTsVfWDvEdCPvPdGldarrVapnIksALeksGYpGpVSI
                         +++ +sV         P + + + +   I+ AL k         
  gi|2464947   937    SMLEDPYVSV---------PVD-NSTVIKDLIRDALDK--------- 964  
                   tAYGdltkiprdtfilvsstiqllraLsstGIsLkhvPaGdkKdArdkki
                     +G + ++  d        ++ +++L+++G+                  
  gi|2464947   965 --FGLQDNQIQD--------YRCSEVLLDRGV------------------ 986  
                   lvdillWaldNppPanlm<-*
                   +  il+W   N  P  +m   
  gi|2464947   987 TERILSW---NERPWDIM    1001 
PF02376: domain 1 of 1, from 969 to 1022: score -42.8, E = 3.1
                   *->nqqigmneelDTaeIarrvkeeLkrhnIgQriFAekvLGlSQGslSd
                        qi ++             e L ++++  ri               
  gi|2464947   969    DNQIQDYR----------CSEVLLDRGVTERI--------------- 990  
                   LLrkPK.PWskLtqkGREpFrRMqnWLsdpnavrdlilqqek<-*
                    L++ ++PW++++q G+++ r M         + ++ ++q k   
  gi|2464947   991 -LSWNErPWDIMKQLGKDSIRQM---------ELMRFYMQHK    1022 
SM00361: domain 1 of 1, from 1031 to 1103: score -4.9, E = 1.3
                   *->lvlvnglvspeeakdEdferelseeeeyfgevgkinKivinkvtkrl
                       ++v +l +++++ + ++e+ l     ++  v+++nK++  ++ +  
  gi|2464947  1031    ALFVGNL-PTGLS-QRNYEQILN----KY--VTDENKFISIGPIY-- 1067 
                   NayenhkrgsggvYitFFersEDAarAivdlnGryfdGRtlkae<-*
                         + gs  v++tF e s  A rA  +l  +   ++ l+     
  gi|2464947  1068 -----YEYGS--VVLTF-EDSMKAVRAFYNLRETIIEDKKLLVL    1103 
SM00360: domain 1 of 1, from 1031 to 1103: score 11.0, E = 0.0059
                   *->tlfVgNLndppdvteedLrelF.kevksvevfraeteskfGkvvsvr
                       lfVgNL  p   ++   +++ +k+v               k +s+ 
  gi|2464947  1031    ALFVGNL--PTGLSQRNYEQILnKYV-----------TDENKFISIG 1064 
                   ivrdkdnilirressleqkvqlgkdsgTGkskGfaFVeFedeedAekAll
                    ++ +                           G    +Fed   A +A  
  gi|2464947  1065 PIYYEY--------------------------GSVVLTFEDSMKAVRA-- 1086 
                   iealnaskGkeledgGrptlglrVe<-*
                     +l     + +  ++++   l V    
  gi|2464947  1087 FYNLR---ETII--EDKK---LLVL    1103 
PF00076: domain 1 of 1, from 1032 to 1102: score 11.2, E = 0.015
                   *->lfVgNLppdvteedLkdlFskfGpivsikivkDhiekpketgkskGf
                      lfVgNLp+  +++  +++  k+   ++  i++             G 
  gi|2464947  1032    LFVGNLPTGLSQRNYEQILNKYVTDENKFISIG------PIYYEYGS 1072 
                   aFVeFeseedAekAlealnGkelggrklrv<-*
                      +Fe++  A +A  +l +++++++kl v   
  gi|2464947  1073 VVLTFEDSMKAVRAFYNLRETIIEDKKLLV    1102 
PF03208: domain 1 of 1, from 1098 to 1256: score -89.5, E = 6.5
                   *->sslevissikeslqsslsslRPWgEFldfsa.....fSrPsSfseat
                      ++l v+  +    +++ s  RP ++F+++++++ +++ + sSf  + 
  gi|2464947  1098    KKLLVLLLPNIEPSMVPSDVRPLLVFVNVKSggcqgLELISSFRKL- 1143 
                   sRvkrNlsyFrvNYvlIfavliiysLitnPllLvvililva.awlfLYlr
                         l  ++v  +   + l +y    +P+  +vi  l+++++  LY++
  gi|2464947  1144 ------LNPYQVFDLDNGGPLPGYV---QPITVFVIRPLIFdSIISLYVF 1184 
                   rsldepLVlfGrsisdrqlyvgLilvsipvlf..Ltgvgs........vl
                   r  +         i++  ++v+ + ++i+ +++ L +vg++++ ++++++
  gi|2464947  1185 R--Q---------ITNYKILVCGGDGTIGWVLqcLDNVGQdsecssppCA 1223 
                   iwtvgas..vvvvlvHAafrenpddlfvdEqee<-*
                   i  +g++++++ vl  ++ + + +d   ++ ++   
  gi|2464947  1224 IVPLGTGndLARVLCWGSGYTGGEDPLNLLRDV    1256 
COG1597: domain 1 of 1, from 1117 to 1512: score -73.3, E = 0.0053
                   *->mkrarliyNptaGkgkakkalrevadrLe.................k
                      +++ ++++N+++G+ ++  +   +++ L + +  + +++++ ++  +
  gi|2464947  1117    VRPLLVFVNVKSGGCQGLELISSFRKLLNpyqvfdldnggplpgyvQ 1163 
                   rggeasvrvttepgvagdAvriakeaaadgrieavDlviaaGGDGTinev
                    +  + +r ++ +       + ++++ +++       ++++GGDGTi+ v
  gi|2464947  1164 PITVFVIRPLIFDS--IISLYVFRQITNYK-------ILVCGGDGTIGWV 1204 
                   angLagtdgevkafnkpaLgilPaGTgNdFARaLgIPrddieaaakaiad
                    + L + + ++   + p+ +i+P+GTgNd+AR L +    ++   + +  
  gi|2464947  1205 LQCLDNVGQDSE-CSSPPCAIVPLGTGNDLARVLCWGS-GYTGGEDPLNL 1252 
                   gktrqvDlgrasyglqrekaneryflniaggGfgae.vtkrvneelkrrl
                         D+ +a+     e   +r+   +    +  e+++k+ +   ++++
  gi|2464947  1253 L----RDVIEAE-----EIRLDRWTVVFHPEDKPEEpAMKAPSQTTGKKK 1293 
                   GplaYllaalrrlsrlrpfplairvdgdgksfegealfllvnntn.....
                    +  + l+  +++++ +  p+  + d+ g+   ++   ++v+n++ + + 
  gi|2464947  1294 KAHQAHLSQSQQTNQHHQLPALTSSDISGGAQNEDNSQIFVMNNYfgigi 1343 
                   ..................................................
                   + +   + ++ +++++++ +++ ++++   + + ++  +++  ++ +++ 
  gi|2464947  1344 dadlcldfhnareenpnqfnsrlrnkgyyvkmglrkivgrkavkdlqkel 1393 
                   .............................npyyGGgmklaPdasldDGll
                   + + +++  + ++ ++    +  + +++ np      ++    ++ DG+l
  gi|2464947  1394 rlevdgkivelppvdgiiilnilswgsgaNPWGPDKDDQFSTPNHYDGML 1443 
                   dviivkaase.a.qllellrllrdvlrGkkhrehpevehlqakkieieth
                    v+ v    +++  l ++   +r+++r             q+ +i+i++ 
  gi|2464947  1444 EVVGV----TgVvHLGQIQSGIRTAMRI-----------AQGGHIKIHLN 1478 
                   gdqakpipvqlDGEiypgalPvririlpgalrvlvPadr<-*
                   +     +pvq+DGE+     P +  +l+ al + + + +   
  gi|2464947  1479 T----DMPVQVDGEPW-IQSPGDVVVLKSALKATMLKKN    1512 
PF00781: domain 1 of 1, from 1119 to 1267: score 167.8, E = 2e-47
                   *->plLVfvNPkSGggqgekelaseskllqkfrelLnprqVfdltktggp
                      plLVfvN kSGg+qg        +l+ +fr+lLnp+qVfdl++ ggp
  gi|2464947  1119    PLLVFVNVKSGGCQGL-------ELISSFRKLLNPYQVFDLDN-GGP 1157 
                   avg....................lelfrdlpdfkeqdqGddrvlvcGGDG
                    +g  ++ +    ++   ++  +l +fr++ ++         +lvcGGDG
  gi|2464947  1158 LPGyvqpitvfvirplifdsiisLYVFRQITNY--------KILVCGGDG 1199 
                   TvgwVlnaldklelplqcqrefpkPpvgilPlGTGNdLarvLgwgggydg
                   T+gwVl++ld+   ++ c      Pp++i+PlGTGNdLarvL wg gy+g
  gi|2464947  1200 TIGWVLQCLDNVGQDSECS----SPPCAIVPLGTGNDLARVLCWGSGYTG 1245 
                   aqlinekllkilgdaleeadtvmldrW<-*
                        e +l+ l+d++ ea  + ldrW   
  gi|2464947  1246 ----GEDPLNLLRDVI-EAEEIRLDRW    1267 
SM00046: domain 1 of 1, from 1119 to 1267: score 166.6, E = 4.6e-47
                   *->plLVfvNPkSGggqgeellkseskllrkfrelLnprqVfdltktggp
                      plLVfvN kSGg+qg        +l+ +fr+lLnp+qVfdl++ ggp
  gi|2464947  1119    PLLVFVNVKSGGCQGL-------ELISSFRKLLNPYQVFDLDN-GGP 1157 
                   dvgle....................lefrdvpkfkeqsdqkgddrvlvcG
                    +g++++ +    ++   ++  +  + fr++ ++           +lvcG
  gi|2464947  1158 LPGYVqpitvfvirplifdsiislyV-FRQITNY----------KILVCG 1196 
                   GDGTvgwVlnaldkrelplqcqvedrefpePPvailPlGTGNdLarvLgw
                   GDGT+gwVl++ld+  +++ c+        PP+ai+PlGTGNdLarvL w
  gi|2464947  1197 GDGTIGWVLQCLDNVGQDSECS-------SPPCAIVPLGTGNDLARVLCW 1239 
                   gggydginekllkilkealeeadtvkldrW<-*
                   g gy+g  e +l  l++++ ea+ + ldrW   
  gi|2464947  1240 GSGYTG-GEDPLNLLRDVI-EAEEIRLDRW    1267 
COG0284: domain 1 of 1, from 1333 to 1511: score -107.0, E = 2.3
                   *->laadlplsvmndprlIrvALDvpd..redalalveelddeeyvlfiK
                       ++ +  ++++d +l+   LD+++ ++e+  ++  +l    +  ++K
  gi|2464947  1333    FVMNNYFGIGIDADLC---LDFHNarEENPNQFNSRLRN--KGYYVK 1374 
                   vGlaFFeLflsaGpdivkeLkargklgvkvFLDLKlhDIPnTvalaakal
                    Gl   +     G ++vk+L ++     ++  D K+   P     ++  +
  gi|2464947  1375 MGL---RKIV--GRKAVKDLQKEL----RLEVDGKIVELPPVDGIIILNI 1415 
                   aelgplAaDmvtVHafgGeemlraavealeelgkGkrPlLiaVtvLTSms
                     +g           +g           + + g                +
  gi|2464947  1416 LSWG-----------SG-----------ANPWG----------------P 1427 
                   epgllqeigidnsladqvirlaklakeaGldGvVcGAspqeaaaiRealg
                   ++++    ++ ++++    + + l  + G+ GvV    +q  + iR a+ 
  gi|2464947  1428 DKDD----QFSTPNH----YDGMLEVV-GVTGVVH--LGQIQSGIRTAMR 1466 
                   egspdflilTPGIRaDkgsakgDQgRvmTpaeAiaaGaDyiVVGRpItqA
                     +           a       ++g     ++++       V G p  q+
  gi|2464947  1467 --I-----------A-------QGGH---IKIHLNTDMPVQVDGEPWIQS 1493 
                   GedPvaaaeaireaaemalee<-*
                    +  v +++ ++  + ++l++   
  gi|2464947  1494 -PGDVVVLKSAL--KATMLKK    1511 
SM00045: domain 1 of 1, from 1334 to 1489: score 262.2, E = 7.8e-76
                   *->iMNNYFSiGvDAkiaLeFHnsREanPekFnSRlkNKlwYfelGtkel
                      +MNNYF+iG+DA+ +L+FHn+RE+nP  FnSRl+NK +Y+++G++++
  gi|2464947  1334    VMNNYFGIGIDADLCLDFHNAREENPNQFNSRLRNKGYYVKMGLRKI 1380 
                   fa.rtcKdLheqIeLecDGvdidlpnkdlslEGIivLNIPSygGGtnLWG
                   + ++  KdL +  +Le+DG++++lp       GIi+LNI S+g+G+n+WG
  gi|2464947  1381 VGrKAVKDLQKELRLEVDGKIVELP----PVDGIIILNILSWGSGANPWG 1426 
                   ePfgskkkravcgifkksftdkedlnfekqsidDgllEVVGvtgamhmaq
                                        ++d++f+ + ++Dg+lEVVGvtg++h++q
  gi|2464947  1427 P--------------------DKDDQFSTPNHYDGMLEVVGVTGVVHLGQ 1456 
                   vrtsiqvglasiilvkllKgrRiaQCsevrlkdtiltkktiPmQVDGEP<
                   +    q g +         + RiaQ+ ++++   + t   +P+QVDGEP 
  gi|2464947  1457 I----QSGIRT--------AMRIAQGGHIKI--HLNT--DMPVQVDGEP  1489 
                   -*
                     
  gi|2464947     -    -    
PF00609: domain 1 of 1, from 1334 to 1489: score 237.0, E = 2.9e-68
                   *->iINNYFSiGVDAsialrFHimREknPekFnSRmkNKlwYfefGtset
                      ++NNYF+iG+DA ++l+FH +RE+nP  FnSR++NK++Y+++G+ ++
  gi|2464947  1334    VMNNYFGIGIDADLCLDFHNAREENPNQFNSRLRNKGYYVKMGLRKI 1380 
                   l.astcknLhesvelecdGqevdLsnrDaslEGIiiLNIPSygGGsnLWG
                   ++ +  k+L +  +le+dG+ v+L+       GIiiLNI S+g+G+n+WG
  gi|2464947  1381 VgRKAVKDLQKELRLEVDGKIVELP----PVDGIIILNILSWGSGANPWG 1426 
                   eskkgkgdigefkksitdpkdlktavqdidDgLlEVVGlegamhmgQiyT
                                     +kd++++ + + Dg lEVVG++g++h+gQi  
  gi|2464947  1427 P-----------------DKDDQFSTPNHYDGMLEVVGVTGVVHLGQI-- 1457 
                   siqlklasWvkLmkgrRlaQCsevRlkDtiktkktlPMQVDGEP<-*
                     q + +       + R+aQ+  +    +i+ +  +P+QVDGEP   
  gi|2464947  1458 --QSGIRT------AMRIAQGGHI----KIHLNTDMPVQVDGEP    1489 
SM00463: domain 1 of 1, from 1346 to 1416: score -20.0, E = 9.4
                   *->ewtLDLHGltveeAlqaLkkfldaarlrgletgervdlpkkleIitG
                      +++LD+H  ++e   q+ ++  +   + +                  
  gi|2464947  1346    DLCLDFHNAREENPNQFNSRLRNKGYYVK------------------ 1374 
                   kGkhslvngkskvkpalkehlqHkhvesfrfaepsegnsGvlvvklk<-*
                   +G  ++  g+++vk   ke+    +v++ + +      +G +++ ++   
  gi|2464947  1375 MGLRKI-VGRKAVKDLQKELRL--EVDGKIVEL--PPVDGIIILNIL    1416 
PF04014: domain 1 of 1, from 1447 to 1487: score -14.1, E = 5.1
                   *->ivvKVdrnGqIVIPkeiRekLGikeGDiLEievdgdggeIilrkykp
                       v+ V  +GqI     iR ++ i+ G  + i ++ d      +++++
  gi|2464947  1447    GVTGVVHLGQIQ--SGIRTAMRIAQGGHIKIHLNTDM----PVQVDG 1487 
                   <-*
                      
  gi|2464947     -     -    
//
-------------- next part --------------
###################################################################################################### */
# COPYRIGHT INFORMATION
# Pfam DOMAIN RESULTS PARSER
# @AUTHOR: Wagied Davids
# @DATE: 22.01.2004
###################################################################################################### */
import sys
import string
import re
import time
class PfamEntry:
    '''
    Prototype class Entry structure
    @author: Wagied Davids
    @date: 22.01.2004
    @copyright: Wagied Davids, ?, 2004
    '''
    # STATIC DATA
    NO_HITS= '[no hits above thresholds]';
    # STATIC REGEX OBJECTS
    REGEX_FAMILY_SCORES= re.compile( r'((\S.*?)\s+(\S.*?)\s+((-| )\S.*?)\s+(\S.*?)\s+(\d+))', re.MULTILINE | re.DOTALL );
   
    def __init__( self, query= None, accession= None, description= None, family_scores= [], parsed_domains= [], alignments= [] ):
        '''
        Constructor for Pfam Entry structure
        @param ( query= None, accession= None, description= None, family_scores= [], parsed_domains= [], alignments= [] )
        @return (None)
        '''
        if query != None:
            self.query= query;
            self.accession= accession;
            self.description= description;
            self.family_scores= family_scores;
            self.family_scores_hitlist= [];          # FAMILY SCORES HITLIST FOR SCORE ENTRIES
            self.parsed_domains= parsed_domains;
            self.alignments= alignments;
        else:
            print 'Error: Query must be provided';
            sys.exit( -1 );
    def getQuery( self ):
        '''
        Retrieves the QUERY
        @param (None)
        @return (String: QUERY )
        '''
        return self.query;
    def getAccession( self ):
        '''
        Retrieves the ACCESSION
        @param (None)
        @return (String: ACCESSION);
        '''
        return self.accession;
    def getDescription( self ):
        '''
        Retrieves the DESCRIPTION
        @param (None)
        @return (String: DESCRIPTION)
        '''
        return self.description;
    def getFamilyScoresRaw( self ):
        '''
        Retrieves a list of FAMILY SCORES
        @param (None)
        @return (List: FAMILY SCORES)
        '''
        return self.family_scores;
    def getNoOfFamilyEntries( self ):
        '''
        Retrieves the number of hits per query
        @param (None)
        @return (Integer: number of hits per query)
        '''
        return len( self.family_scores );
    def getFamilyScoresML( self ):
        '''
        FINE-GRAINED CONTROL OVER FAMILY CLASSIFICATION AND SCORE RESULTS
        @param (None)
        @return ()
        '''
        # BEGIN FAMILY_SCORE_LIST TAG
        family_scores= "<FAMILY_SCORES_HITLIST>\n";
        family_scores_counter= 1;
        
        for score_entry in self.getFamilyScoresRaw():
            MATCH_SCORE_ENTRY= PfamEntry.REGEX_FAMILY_SCORES.search( score_entry );
            if MATCH_SCORE_ENTRY != None:
                # BEGIN FAMILY_SCORE_HIT TAG
                family_scores= family_scores + "\t\t<FAMILY_SCORE_HIT= %d>\n" % ( family_scores_counter );
                # EXTRACT INFORMATION FROM MATCH_SCORE_ENTRY
                # MATCH_SCORE_ENTRY.group( 1 ) equals WHOLE ENTRY
                FAMILY_MODEL= MATCH_SCORE_ENTRY.group( 2 );
                FAMILY_DESCRIPTION= MATCH_SCORE_ENTRY.group( 3 );
                FAMILY_SCORE_VALUE= MATCH_SCORE_ENTRY.group( 4 );
                # MATCH_SCORE_ENTRY.group( 5 ) equals '-' IF PRESENT
                FAMILY_E_VALUE= MATCH_SCORE_ENTRY.group( 6 );
                FAMILY_N_VALUE= MATCH_SCORE_ENTRY.group( 7 );
                # FORMAT ENTRY TAGS
                family_scores= family_scores + "\t\t\t<FAMILY_SCORE_MODEL>%s</FAMILY_SCORE_MODEL>\n" % ( FAMILY_MODEL );
                family_scores= family_scores + "\t\t\t<FAMILY_SCORE_DESCRIPTION>%s</FAMILY_SCORE_DESCRIPTION>\n" % ( FAMILY_DESCRIPTION );
                family_scores= family_scores + "\t\t\t<FAMILY_SCORE_VALUE>%s</FAMILY_SCORE_VALUE>\n" % ( FAMILY_SCORE_VALUE );
                family_scores= family_scores + "\t\t\t<FAMILY_E_VALUE>%s</FAMILY_E_VALUE>\n" % ( FAMILY_E_VALUE );
                family_scores= family_scores + "\t\t\t<FAMILY_N_VALUE>%s</FAMILY_N_VALUE>\n" % ( FAMILY_N_VALUE );
                # COMPLETE FAMILY_SCORE_HIT TAG
                family_scores= family_scores + "\t\t</FAMILY_SCORE_HIT>\n";
                # INCREMENT family_scores_counter
                family_scores_counter= family_scores_counter + 1;
        # COMPLETE FAMILY_SCORE_LIST TAG
        family_scores= family_scores + "\t</FAMILY_SCORES_HITLIST>\n";
        return family_scores;
    
    def getParsedDomainsRaw( self ):
        '''
        Retrieves a list of PARSED DOMAINS
        @param (None)
        @return (List: PARSED DOMAINS)
        '''
        return self.parsed_domains;
    def getNoOfParsedDomains( self ):
        '''
        Retrieves the number of parsed hits per query
        @param (None)
        @return (Integer: number of parsed hits per query)
        '''
        return len( self.parsed_domains ); 
    def getParsedDomainsML( self ):
        '''
        FINE-GRAINED CONTROL OVER PARSED DOMAINS AND SCORE RESULTS
        @param (None)
        @return ()
        '''
        parsed_domain_list= [];
        PARSED_MODEL= '';
        PARSED_DOMAIN_NUMBER= '';
        PARSED_DOMAIN_SEQ_F= '';
        PARSED_DOMAIN_SEQ_T= '';
        PARSED_DOMAIN_SEQ_F= '';
        PARSED_DOMAIN_2_DOTS= '';
        PARSED_DOMAIN_BRACKETS= '';
        PARSED_DOMAIN_SCORE= '';
        PARSED_DOMAIN_E_VALUE= '';
        parsed_domains_counter= 1;
        # BEGIN PARSED_DOMAINS_LIST TAG
        parsed_domains= '<PARSED_DOMAINS_HITLIST>\n';
    
        for domain in self.getParsedDomainsRaw():
            # IF NO_HITS NOT FOUND, THEN EXTRACT DATA
            if string.find( domain, PfamEntry.NO_HITS ) < 0:
                parsed_domain_list= string.split( domain );
                PARSED_MODEL= parsed_domain_list[0];
                PARSED_DOMAIN_NUMBER= parsed_domain_list[1];
                PARSED_DOMAIN_SEQ_F= parsed_domain_list[2];
                PARSED_DOMAIN_SEQ_T= parsed_domain_list[3];
                #PARSED_DOMAIN_2_DOTS= parsed_domain_list[4];           
                PARSED_DOMAIN_HMM_F= parsed_domain_list[5];
                PARSED_DOMAIN_HMM_T= parsed_domain_list[6];
                #PARSED_DOMAIN_BRACKETS= parsed_domain_list[7];
                PARSED_DOMAIN_SCORE= parsed_domain_list[8];
                PARSED_DOMAIN_E_VALUE= parsed_domain_list[9];
                # BEGIN PARSED_DOMAIN_HIT TAG
                parsed_domains= parsed_domains + "\t\t<PARSED_DOMAIN_HIT= %d>\n" % ( parsed_domains_counter );
                # FORMAT ENTRY TAGS
                parsed_domains= parsed_domains + "\t\t\t<PARSED_MODEL>%s</PARSED_MODEL>\n" % ( PARSED_MODEL );
                parsed_domains= parsed_domains + "\t\t\t<PARSED_DOMAIN_NUMBER>%s</PARSED_DOMAIN_NUMBER>\n" % ( PARSED_DOMAIN_NUMBER );
                parsed_domains= parsed_domains + "\t\t\t<PARSED_DOMAIN_SEQ_F>%s</PARSED_DOMAIN_SEQ_F>\n" % ( PARSED_DOMAIN_SEQ_F );
                parsed_domains= parsed_domains + "\t\t\t<PARSED_DOMAIN_SEQ_T>%s</PARSED_DOMAIN_SEQ_T>\n" % ( PARSED_DOMAIN_SEQ_T );
                parsed_domains= parsed_domains + "\t\t\t<PARSED_DOMAIN_HMM_F>%s</PARSED_DOMAIN_HMM_F>\n" % ( PARSED_DOMAIN_HMM_F );
                parsed_domains= parsed_domains + "\t\t\t<PARSED_DOMAIN_HMM_T>%s</PARSED_DOMAIN_HMM_T>\n" % ( PARSED_DOMAIN_HMM_T );
                parsed_domains= parsed_domains + "\t\t\t<PARSED_DOMAIN_SCORE>%s</PARSED_DOMAIN_SCORE>\n" % ( PARSED_DOMAIN_SCORE );
                parsed_domains= parsed_domains + "\t\t\t<PARSED_DOMAIN_E_VALUE>%s</PARSED_DOMAIN_E_VALUE>\n" % ( PARSED_DOMAIN_E_VALUE );
                # COMPLETE PARSED_DOMAIN_HIT TAG
                parsed_domains= parsed_domains + "\t\t</PARSED_DOMAIN_HIT>\n";
                # INCREMENT parsed_domains_counter
                parsed_domains_counter= parsed_domains_counter + 1;
            else:
                # NO_HITS FOUND
                return domain;
        # COMPLETE PARSED_DOMAINS_LIST TAG
        parsed_domains= parsed_domains + '</PARSED_DOMAINS_HITLIST>\n';
        return parsed_domains;
        
    def getAlignments( self ):
        '''
        Retrieves a list of TOP SCORING ALIGNMENTS
        @param (None)
        @return (List: TOP SCORING ALIGNMENTS)
        '''
        return self.alignments;
    def getRegexFamilyScores( self ):
        '''
        Retrieves the Regex object for Pfam family scores
        @param (None)
        @return (Regex: Regex object for Pfam family scores)
        '''
        return PfamEntry.REGEX_FAMILY_SCORES;
    def __str__( self ):
        '''
        Retrieves a string representation of parser entry class
        @param (None)
        @return (None)
        '''
        strBuffer= '';
        strBuffer= strBuffer + "<HMMER>\n";
        strBuffer= strBuffer + "\t<QUERY>%s</QUERY>\n" % ( self.getQuery() );
        strBuffer= strBuffer + "\t<ACCESSION>%s</ACCESSION>?\n" % ( self.getAccession() );
        strBuffer= strBuffer + "\t<DESCRIPTION>%s</DESCRIPTION>\n" % ( self.getDescription() ); 
        strBuffer= strBuffer + "\t%s" % ( self.getFamilyScoresML() );
        strBuffer= strBuffer + "\t%s" % ( self.getParsedDomainsML() );
        strBuffer= strBuffer + "\t<ALIGNMENTS>%s</ALIGNMENTS>\n" % ( self.getAlignments() );
        strBuffer= strBuffer + "</HMMER>";
        
        return strBuffer;
class PfamParser:
    '''
    Prototype class for parsing hmmpfam output
    @author: Wagied Davids
    @date: 22.01.2004
    @copyright: Wagied Davids, ?, 2004
    '''
    # DECLARATION OF STATIC DATA
    HMM_HEADER_SEPERATOR= '-';
    HMM_FILE= 'HMM file:';
    HMM_SEQ_FILE= 'Sequence file:';
    HMM_QUERY_SEQ= 'Query sequence:';
    HMM_ACC= 'Accession:';
    HMM_DESCRIPTION= 'Description:';
    HMM_SCORE_HEADER= 'Scores for sequence family classification (score includes all domains):';
    HMM_PARSED_DOMAINS= 'Parsed for domains:';
    HMM_ALIGNMENT= 'Alignments of top-scoring domains:';
    HMM_SPACE= ' ';
    HMM_TAB= '\t';
    HMM_NEWLINE= '\n';
    HMM_ENTRY_SEPERATOR= '//';
    HMM_ENTRY_COUNTER= 0;
    # STATIC DATA STRUCTURE    
    # STATIC REGEX OBJECTS
    REGEX_HMM_ENTRY= re.compile( r'(Query sequence:\s+\S.*\s+//)', re.MULTILINE | re.DOTALL );
    REGEX_HMM_QUERY= re.compile( r'Query sequence:\s+(\S.*?)\s+Accession', re.MULTILINE | re.DOTALL );
    REGEX_HMM_ACC= re.compile( r'Accession:\s+(\S.*?)\s+Description', re.MULTILINE | re.DOTALL );
    REGEX_HMM_DESCRIPTION= re.compile( r'Description:\s+(\S.*?)\s+Scores', re.MULTILINE | re.DOTALL );
    REGEX_HMM_SEQ_FAMILY_SCORES= re.compile( r'(Scores\s+\S.*)\s+Parsed', re.MULTILINE | re.DOTALL );
    REGEX_HMM_PARSED_DOMAINS= re.compile( r'(Parsed for domains:\s+\S.*)\s+Alignments', re.MULTILINE | re.DOTALL );
    REGEX_HMM_ALIGNMENTS= re.compile( r'(Alignments of top-scoring domains:\s+\S.*)\s+//', re.MULTILINE | re.DOTALL );
    def __init__( self, filename= None ):
        '''
        Constructor for PfamParser
        @param (Filename)
        @return (None)
        '''
        if filename != None:
            self.filename= filename;
            self.debug= 0;
            self.HMM_FAMILY_SCORES_HITS= {};
            self.HMM_PARSED_DOMAINS= {};
        else:
            print 'Please enter filename';
            sys.exit ( -1 );
    def setDebug( self, debug= 0 ):
        '''
        Sets the debug level when parsing
        debug= 0 No debug information
        debug= 1 Pfam Entry level debug information  
        debug= 2 Regex level debug information
        debug= 3 Incoming data 
        @param (Integer representing the verbosity/ debug level)
        @return (None)
        '''
        self.debug= debug;
        return ;
    def getFilename( self ):
        '''
        Retrieves the filename
        @param (None)
        @return (String: Filename)
        '''
        return self.filename;
    def parse( self ):
        '''
        MAIN PARSER FUNCTION
        @param (None)
        @return (None)
        '''
        try:
            mode= 'r';
            line= '';
            data_entry= '';
            HMM_QUERY= '';
            HMM_ACC= '';
            HMM_DESCRIPTION= '';
            HMM_SCORES= '';
            HMM_DOMAINS= '';
            HMM_ALIGNMENTS= '';
            # FAMILY SCORES
            FAMILY_SCORES_TITLE= '';
            FAMILY_SCORES_HEADER= '';
            FAMILY_SCORES_INFO_LIST= [];
            FAMILY_SCORES_LIST= [];
            
            # PARSED DOMAIN HITS
            HMM_DOMAINS= '';
            PARSED_DOMAINS_INFO_LIST= [];
            PARSED_DOMAINS_TITLE= '';
            PARSED_DOMAINS_HEADER= '';
            PARSED_DOMAINS_LIST= [];
            # DOMAIN ALGINMENTS INFORMATION
            DOMAIN_ALIGN_HEADER= '';
            DOMAIN_ALIGNMENTS_LIST= [];
            # Open file stream for reading
            fopen= open( self.filename, mode );
            while fopen:
                line= fopen.readline();
                if not line: break;
                # Pfam ENTRY DETECTED
                if line[ 0: len( PfamParser.HMM_QUERY_SEQ ) ] == PfamParser.HMM_QUERY_SEQ:
                    data_entry= data_entry + string.rstrip( line ) + PfamParser.HMM_SPACE;
                    while line[ 0: len( PfamParser.HMM_ENTRY_SEPERATOR ) ] != PfamParser.HMM_ENTRY_SEPERATOR:
                        line= fopen.readline();
                        if not line: break;
                        if line[ 0: len( PfamParser.HMM_ENTRY_SEPERATOR ) ] == PfamParser.HMM_ENTRY_SEPERATOR:
                            PfamParser.HMM_ENTRY_COUNTER= PfamParser.HMM_ENTRY_COUNTER + 1;
                            data_entry= data_entry + PfamParser.HMM_SPACE + PfamParser.HMM_ENTRY_SEPERATOR;
                            
                            # EXTRACT PFAM ENTRY INFORMATION
                            # DEBUG INFO
                            if self.debug == 3:
                                print data_entry;
                            
                            # MATCH ENTRY STRUCTURE
                            MATCH_HMM_ENTRY= PfamParser.REGEX_HMM_ENTRY.search( data_entry );
                            # DEBUG INFO
                            if self.debug == 2:
                                print "%s: %s" % ( MATCH_HMM_ENTRY, MATCH_HMM_ENTRY.re.pattern );
                                
                            if MATCH_HMM_ENTRY != None:
                                # DEBUG INFO
                                if self.debug == 2:
                                    print "%d. %s" % ( PfamParser.HMM_ENTRY_COUNTER, MATCH_HMM_ENTRY );
                                    
                                self.ENTRY= MATCH_HMM_ENTRY.group( 1 );
                                #print self.ENTRY;
                                
                                # MATCH QUERY SEQUENCE
                                MATCH_HMM_QUERY= PfamParser.REGEX_HMM_QUERY.search( self.ENTRY );
                                # DEBUG INFO
                                if self.debug == 2:
                                    print "%s: %s" % ( MATCH_HMM_QUERY, MATCH_HMM_QUERY.re.pattern );
                                    
                                if MATCH_HMM_QUERY != None:
                                    HMM_QUERY= MATCH_HMM_QUERY.group( 1 );
                                    #print HMM_QUERY, '-> ', 
                                # MATCH ACCESSION
                                MATCH_HMM_ACC= PfamParser.REGEX_HMM_ACC.search( self.ENTRY );
                                # DEBUG INFO
                                if self.debug == 2:
                                    print "%s: %s" % ( MATCH_HMM_ACC, MATCH_HMM_ACC.re.pattern );
                                    
                                if MATCH_HMM_ACC != None:
                                    HMM_ACC=  MATCH_HMM_ACC.group( 1 );
                                    #print HMM_ACC;
                                # MATCH DESCRIPTION
                                MATCH_HMM_DESCRIPTION= PfamParser.REGEX_HMM_DESCRIPTION.search( self.ENTRY );
                                 # DEBUG INFO
                                if self.debug == 2:
                                    print "%s: %s" % ( MATCH_HMM_DESCRIPTION, MATCH_HMM_DESCRIPTION.re.pattern );
                                    
                                if MATCH_HMM_DESCRIPTION != None:
                                    HMM_DESCRIPTION= MATCH_HMM_DESCRIPTION.group( 1 );
                                    #print HMM_DESCRIPTION;
                                # MATCH FAMILY SCORES
                                # NB !!! --- MAXIMUM RECURSION LIMIT ---- !!!
                                try:
                                    MATCH_HMM_SCORES= PfamParser.REGEX_HMM_SEQ_FAMILY_SCORES.search( self.ENTRY );
                                    # DEBUG INFO
                                    if self.debug == 2:
                                        print "%s: %s" % ( MATCH_HMM_SCORES, MATCH_HMM_SCORES.re.pattern );
                                        
                                    if MATCH_HMM_SCORES != None:
                                        HMM_SCORES= MATCH_HMM_SCORES.group( 1 );
                                        FAMILY_SCORES_INFO_LIST= string.split( HMM_SCORES, PfamParser.HMM_NEWLINE );
                                        FAMILY_SCORES_TITLE= FAMILY_SCORES_INFO_LIST[0];
                                        FAMILY_SCORES_HEADER= FAMILY_SCORES_INFO_LIST[1];
                                        #FAMILY_SCORES_TABLINE= FAMILY_SCORES_INFO_LIST[2];
                                        
                                        # NOTE: LAST ELEMENT = EMPTY SPACE
                                        # COLLECT IN HASH
                                        FAMILY_SCORES_LIST= FAMILY_SCORES_INFO_LIST[ 3: -1 ];
                                        self.HMM_FAMILY_SCORES_HITS[ HMM_QUERY ]= FAMILY_SCORES_LIST;
                                                                                                                    
                                except RuntimeError, run_err:
                                    print "Error: MATCHING PFAM FAMILY SCORES!"
                                    print run_err;
                                # MATCH PARSED DOMAIN INFORMATION
                                # NB !!! --- MAXIMUM RECURSION LIMIT ---- !!!
                                try:
                                    MATCH_HMM_PARSED_DOMAINS= PfamParser.REGEX_HMM_PARSED_DOMAINS.search( self.ENTRY );
                                    # DEBUG INFO
                                    if self.debug == 2:
                                        print "%s: %s" % ( MATCH_HMM_PARSED_DOMAINS, MATCH_HMM_PARSED_DOMAINS.re.pattern );
                                        
                                    if MATCH_HMM_PARSED_DOMAINS != None:
                                        HMM_DOMAINS= MATCH_HMM_PARSED_DOMAINS.group( 1 );
                                        PARSED_DOMAINS_INFO_LIST= string.split( HMM_DOMAINS, PfamParser.HMM_NEWLINE );
                                        PARSED_DOMAINS_TITLE= PARSED_DOMAINS_INFO_LIST[0];
                                        PARSED_DOMAINS_HEADER= PARSED_DOMAINS_INFO_LIST[1];
                                        #PARSED_DOMAINS_TABLINE= PARSED_DOMAINS_INFO_LIST[2];
                                        # NOTE: LAST ELEMENT = EMPTY SPACE
                                        # COLLECT IN HASH
                                        PARSED_DOMAINS_LIST= PARSED_DOMAINS_INFO_LIST[ 3: -1 ];
                                        self.HMM_PARSED_DOMAINS[ HMM_QUERY ]= PARSED_DOMAINS_LIST;
                                        
                                except RuntimeError, run_err:
                                    print "Error: MATCHING PFAM PARSED DOMAIN INFORMATION!";
                                    print run_err;
                                    
                                # MATCH DOMAIN ALIGNMENTS
                                # NB !!! --- MAXIMUM RECURSION LIMIT ---- !!!
                                try:
                                    MATCH_HMM_ALIGNMENTS= PfamParser.REGEX_HMM_ALIGNMENTS.search(  self.ENTRY );
                                    # DEBUG INFO
                                    if self.debug == 2:
                                        print "%s: %s" % ( MATCH_HMM_ALIGNMENTS, MATCH_HMM_ALIGNMENTS.re.pattern );
                                        
                                    if MATCH_HMM_ALIGNMENTS != None:
                                        HMM_ALIGNMENTS= MATCH_HMM_ALIGNMENTS.group( 1 );
                                        DOMAIN_ALIGNMENTS_INFO_LIST= string.split( HMM_ALIGNMENTS , "\n" );
                                        DOMAIN_ALIGN_HEADER= DOMAIN_ALIGNMENTS_INFO_LIST[0];
                                        DOMAIN_HIT_INFO= DOMAIN_ALIGNMENTS_INFO_LIST[1];
                                        DOMAIN_ALIGNMENTS_LIST= DOMAIN_ALIGNMENTS_INFO_LIST[ 3:-2 ];
                                        #print DOMAIN_ALIGNMENTS_LIST;
                                        
                                except RuntimeError, run_err:
                                    print "Error: MATCHING PFAM DOMAIN ALIGNMENTS!";
                                    print run_err;
                                   
                                
                                # Construct Pfam Entry structure
                                Entry= PfamEntry( HMM_QUERY, HMM_ACC, HMM_DESCRIPTION, FAMILY_SCORES_LIST, PARSED_DOMAINS_LIST, DOMAIN_ALIGNMENTS_LIST);
                                # DEBUG INFO
                                if self.debug == 1:
                                    print Entry;
                                    #print "%s => %s" % ( Entry.getQuery(), Entry.getDescription() );
                                    #print Entry.getFamilyScoresML();
                                    #print Entry.getParsedDomainsML();
                                
                                # CLEAN DATA VARIABLE
                                data_entry= '';
                                HMM_QUERY= '';
                                HMM_ACC= '';
                                HMM_DESCRIPTION= '';
                             
                                # FAMILY SCORES INFORMATION
                                HMM_SCORES= '';
                                FAMILY_SCORES_INFO_LIST= [];
                                FAMILY_SCORES_TITLE= '';
                                FAMILY_SCORES_HEADER= '';
                                FAMILY_SCORES_LIST= [];
                                # PARSED DOMAINS INFORMATION
                                HMM_DOMAINS= '';
                                PARSED_DOMAINS_INFO_LIST= [];
                                PARSED_DOMAINS_TITLE= '';
                                PARSED_DOMAINS_HEADER= '';
                                PARSED_DOMAINS_LIST= [];
                                                                
                                # DOMAIN INFORMATION
                                HMM_ALIGNMENTS= '';
                                DOMAIN_ALIGNMENTS_INFO_LIST= [];
                                DOMAIN_ALIGN_HEADER= '';
                                DOMAIN_HIT_INFO= '';
                                DOMAIN_ALIGNMENTS_LIST= [];
                        # ///////////////////////////////////////// ACCUMULATE DATA //////////////////////////////////////
                        else:
                            data_entry= data_entry + line ;
               
        except IOError, io_err:
            print io_err;
        else:
            fopen.close();
            return ;
    def getPfamHits( self ):
        '''
        Retrieves Pfam hits QUERY => HITS DATABASE
        @param (None)
        @return (Hash: Pfam hits QUERY => HITS DATABASE)
        '''
        return self.HMM_FAMILY_SCORES_HITS;
    def getNoOfHits( self ):
        '''
        Retrieves the number of Pfam hits
        @param (None)
        @return (Integer: number of Pfam hits)
        '''
        return len( self.getPfamHits() );
    def getPfamParsedDomains( self ):
        '''
        Retrieve Pfam parsed domains QUERY => HITS DATABASE
        @param (None)
        @return (Hash: Pfam parsed domains QUERY => HITS DATABASE)
        '''
        return self.HMM_PARSED_DOMAINS;
    def getNoOfParsedDomains( self ):
        '''
        Retrieves the number of PARSED Pfam hits
        @param (None)
        @return (Integer: number of PARSED Pfam hits)
        '''
        return len( self.getPfamParsedDomains() );
    def getRegexHMM_Entry( self ):
        '''
        Retrieves the Regex object for REGEX_HMM_ENTRY
        @param (None)
        @return (Regex: HMM_ENTRY)
        '''
        return PfamParser.REGEX_HMM_ENTRY;
    def getRegexQuery( self ):
        '''
        Retrieves the Regex object for REGEX_HMM_QUERY
        @param (None)
        @return (Regex: REGEX_HMM_QUERY)
        '''
        return PfamParser.REGEX_HMM_QUERY;
    
    def getRegexAccession( self ):
        '''
        Retrieves the Regex object for REGEX_HMM_ACC
        @param (None)
        @return (Regex: REGEX_HMM_ACC)
        '''
        return PfamParser.REGEX_HMM_ACC;
        
    def getRegexDescription( self ):
        '''
        Retrieves the Regex object for REGEX_HMM_DESCRIPTION
        @param (None)
        @return (Regex: REGEX_HMM_DESCRIPTION)
        '''
        return PfamParser.REGEX_HMM_DESCRIPTION;
    def getRegexFamilyScores( self ):
        '''
        Retrieves the Regex object for REGEX_HMM_SEQ_FAMILY_SCORES
        @param (None)
        @return (Regex: REGEX_HMM_SEQ_FAMILY_SCORES)
        '''
        return PfamParser.REGEX_HMM_SEQ_FAMILY_SCORES;
    def getRegexParsedDomains( self ):
        '''
        Retrieves the Regex object for REGEX_HMM_DOMAINS
        @param (None)
        @return (Regex: REGEX_HMM_DOMAINS)
        '''
        return PfamParser.REGEX_HMM_PARSED_DOMAINS;
    def getRegexAlignments( self ):
        '''
        Retrieves the Regex object for REGEX_HMM_ALIGNMENTS
        @param (None)
        @return (Regex: REGEX_HMM_ALIGNMENTS)
        '''
        return PfamParser.REGEX_HMM_ALIGNMENTS;
    
    def __str__( self ):
        '''
        Retrieves a string representation of parser class
        @param (None)
        @return (String: Retrieves a string representation of parser class)
        '''
        strBuffer= 'ParserType: PfamParser';
        strBuffer= strBuffer + "Filename: %s" % ( self.getFilename() );
        return strBuffer;
        
-------------- next part --------------
#!/usr/bin/env python
###################################################################################################### */
# COPYRIGHT INFORMATION
# Pfam DOMAIN RESULTS PARSER
# @AUTHOR: Wagied Davids
# @DATE: 22.01.2004
###################################################################################################### */
import string 
import PfamParser
# Module level re-name
PfamParser= PfamParser.PfamParser;
# DATA LOCATION
filename= 'hmmpfam_output.example';
# DATA STRUCTURE
PFAM_DB= {};
# Construct Parser
parser= PfamParser( filename );
# SET DEBUG LEVEL
parser.setDebug( 1 );
# parse document
parser.parse();
# retrieve Pfam hits
PFAM_DB= parser.getPfamParsedDomains();
counter= 1;
for QUERY in PFAM_DB.keys():
    for HIT in PFAM_DB[ QUERY ]:
        print "%d. %s => %s"  % ( counter, QUERY, HIT );
        counter= counter + 1;
    
    
More information about the Biopython-dev
mailing list