[BioPython] Formating files for Clustalw

meames at itsa.ucsf.edu meames at itsa.ucsf.edu
Tue Aug 23 15:42:32 EDT 2005


Hi all

I'm working my way though the cookbook and I've run in to a snag in
section 3.5.1 - Clustalw

I've created a simple two-entry FASTA file for aligning but the parser
appears to reject it. There are no question marks or other punctuation in 
the titles (such as I've read on this board) that would seem to give it
trouble, so I'm at a bit of a loss. Can anyone help? (I'm running
clustalw 1.81)


Here is the error message:

Traceback (most recent call last):
  File "./practice.py", line 21, in ?
    alignment = Clustalw.do_alignment(cline)
  File "/usr/lib/python2.3/site-packages/Bio/Clustalw/__init__.py", line
116, in do_alignme nt
    return parse_file(out_file, alphabet)
  File "/usr/lib/python2.3/site-packages/Bio/Clustalw/__init__.py", line
55, in parse_file
    parser.parseFile(to_parse)
  File "/usr/lib/python2.3/site-packages/Martel/Parser.py", line 328, in
parseFile
    self.parseString(fileobj.read())
  File "/usr/lib/python2.3/site-packages/Martel/Parser.py", line 356, in
parseString
    self._err_handler.fatalError(result)
  File "/usr/lib/python2.3/site-packages/_xmlplus/sax/handler.py", line
38, in fatalError
    raise exception
Martel.Parser.ParserPositionException: error parsing at or beyond character 0



Here is the code:
cline = MultipleAlignCL('file_to_align')
cline.set_output('test.aln')
alignment = Clustalw.do_alignment(cline)

Here is the file_to_align:
>sptrembl|Q00647|Q00647 Myosin I heavy chain.  [Emericella nidulans]
MGHSRRPAGGEKKSRFGRSKAAADVGDGRQAGGKPQVRKAVFESTKKKEIGVSDLTLLSK
ISNEAINDNLKLRFQHDEIYTYIGHVLVSVNPFRDLGIYTDSVLNSYRGKNRLEVPPHVF
AVAESAYYNMKSYKDNQCVIISGESGAGKTEAAKRIMQYIASVSGGSDSSIQQTKDMVLA
TNPLLESFGNAKTLRNNNSSRFGKYLELEFNAQGEPVGANITNYLLEKSRVVGQITNERN
FHIFYQFAKGAPQKYRDSFGVQQPQSYLYTSRSKCFDVPGVDDVAEFQDTLNAMSVIGMS
EAEQDNVFRMLAAILWMGNIQFAEDDSGNAAITDQSVVDFVAYLLEVDAGQVNQALTIRM
METSRGGRRGSVYEVPLNTTQALAVRDALAKAIYFNLFDWIVGRVNQSLTAKGAVANSIG
ILDIYGFEIFEKNSFEQLCINYVNEKLQQIFIQLTLKAEQDEYEREQITWTPIKYFDNKV
VCSLIEDKRPPGVFAALNDACATAHADSGAADNTFVGRLNFLGQNPNFENRQGQFIIKHY
AGDVSYAVQGMTDKNKDQLLKDLLNLVQSSSNHFVHTLFPEQVNQDDKRRPPTASDKIKA
SANDLVAMLMKAQPSYIRTIKPNDNKAPKEFNESNVLHQIKYLGLQENVRIRRAGFAYRQ
TFDKFVERFYLLSPKTSYAGDYTWTGDVETGARQILKDTRIPAEEYQMGITKVFIKTPET
LFALEAMRDRYWHNMAIRIQRAWRNYLRYRTECAIRIQRFWPRMNGGLELLKLRDQGHTI
LGGRKERRRMSILGSRRFLGDYVGISNKGGPGEMIRSGAAISTSDDVLFSCRGEVLVSKF
GRSSKPSPRIFVLTNRHVYIVSQNFVNNQLVISSERTIPIGAIKTVSASSYRDDWFSLVV
GGQEPDPLCNCVFKTEFFTHLHNALRGQLNLKIGPEIEYNKKPGKLATVKVVKDGSQVDS
YKSGTIHTGPGEPPNSVSKPTPRGKQVAARPVTKGKLLRLAVQAVARPNWLPDLYQSVGL
YHSPRLKQPRRNRHQRPDPFLNQWQPLQHPIHVLHLLPPQGHHPRLLPRPPAAAGPKKAK
ALYDFSSDNNGMLSISAGQIVEIVSKEGNGWWLCMNLETSAQGWTPEAYLEEQVAPTPKP
APPPPPPVAPRASPAPVNGSAAVAAAKAKAAPPPPAKRPNMAGRKTAPAPPPAPRDSAVS
MNSQGDSSGASGRGTPSSVSNACLAGGLAEALRRRQSAMQGKQDDDDDW
>gi|17507983|ref|NP_492393.1| F29D10.4 [Caenorhabditis elegans]
MAFHWQSKVNVQHVGVDDMVLLPKLTEQSIVENLKKRLQANSIFTYIGPVLISVNPFKQM
PYFTEKEMLLYQGAAQYENAPHIYALADNMYRNMLIDNESQCVIISGESGAGKTVNAKFI
MNYISRISGGGQKVQHIKDVILQSNPLLEAFGNSATVRNWNSSRFGKYVEIVFSRGGEPI
GGKLSNFLLEKSRVVHQNEGDRNFHVFYQLCAGADKNLRSTFGIGELQYYNYLNMSGVFK
ADDTDDGKEFESTLHAMKVVGVNDQDQLEVLRIVATVLHIGNITFTEENNFAAVSGKDYL
EYPAFLLGLTSADIEAKLTGRKMESKWGTQKEEIDMKLNVEQASYTRDAWVKAIYARLFD
YLVKKVNDAMNITSQSTSDNFSVGILDIYGFEIFNNNGFEQFCINFVNEKLQQIFIELTL
KAEQEEYVREGIKWTEIDYFDNKIVCDLIETKRPPGIMSLLDDTCAQNHGQREGVDRQLL
TTLSKSFAGHPHFGPGSDSFVIKHYAGDVTYNVDGFCDRNRDVLYPDLILLMQKSSRPFI
QALFPENVAASAGKRPTTFSTKIRTQANTLVESLMKCSPHYVRCIKPNETKRPNDWEESR
VKHQVEYLGLRENIRVRRAGFAYRRAFDKFAQRYAIVSPQTWPCFQGDQQRACEIICDSV
HMEKNQYQMGKTKIFVKNPESLFLLEETRERKFDGYARVIQKAWRQFSARKQHIKQKEQA
ADLMYGKKERRRYSLNRNFVGDYIGLEHHPTLQSLVGKRQRVLFACTANKYDRKFRVTKL
DLLLTVNHLTLIGKEKVKNGPEKGKIVEVIKRQFDLPQIKSIGLSPYQDDFVILYLGNDD
YSSLLETPFKTEFCTALSKAYKERTNGTLHLDFRSSHVVSYKKMKFDFSDGKRTVQFGND
GTSSAEKTLKPNGKVLNVSIGTGLPNTTRPSTERPQGGYTPRRDQLRTSTRRTKQNNQSY
GQNGQSQAMRAPVPAHGMNNNYNQTPAPVSTNHQYSQEPARIPVMGNVINQLNNMNLSGN
GNSPAGRGPPPARGPKPPPPAKPKLNPVVIAVYPYEAQDVDELSFEAGAEIELMNKDASG
WWQGKVNNRVGLFPGNYVKE

I've also attempted to run the simple command line:

clustalw ./file_to_align -OUTFILE=test.aln

without success, resulting in the error message:

Error: unknown option -./file_to_align


Thanks
Matt "I'm new at this" Eames




More information about the BioPython mailing list