[BioPython] Clustalw.parse_file errors

Nick Matzke matzke at berkeley.edu
Tue Aug 5 23:19:23 UTC 2008


Never mind, it turns out my alignment file was missing a blank line 
after each section of the alignment.  The .aln file doesn't have to have 
a consensus line with "*", ":" characters in it necessarily, but it does 
have to have at least a line of spaces of the length of the aligned 
block (this is what protein.aln has).

I inserted a line of spaces after each chunk of the alignment and now it 
parses.

(My alignment wasn't generated by Clustal anyway, so I also added this 
header line to make the parser happy: "CLUSTAL W (1.83) formatted 
alignment done with PROMALS3D")


I.e. for future readers (truncating my .aln file)...

...this got the _star_info error:
================================
CLUSTAL W (1.83) formatted alignment done with PROMALS3D


SctN_Salt  ----------------MKNEL---------------------------------------
SctN_EHEC  MISEHDSVLEKYPRIQKVLNST--------------------------------------
SctN_Chrm  ---------MRLPDIRLIENTL--------------------------------------
SctN_Yers  ---------MKLPDIARLTPRL--------------------------------------
SctN_Soda  ----------MTCNSQRLASML--------------------------------------
SctN_Laws  ----------------MALEYI--------------------------------------
SctN_Chl4  ----------------MEEITTE-------------------------------------


SctN_Salt  --------------------------MQRLRLKYPPP---------DGYCR--------W
SctN_EHEC  --------------------------VPALSLN-------------SSTRY--------E
SctN_Chrm  --------------------------RERLTLAPA---PPGQR---SGVEL--------F
SctN_Yers  --------------------------QQQLTRPSAPP---------EGLRY--------R
SctN_Soda  --------------------------AQHLTPVDEPP---------DGYRL--------T
SctN_Laws  --------------------------ASLLEEAVQNT---------SPVEV--------R
SctN_Chl4  --------------------------FNTLMTELPDV---------QLTAV--------V


===================================



...but this parsed successfully:
================================
CLUSTAL W (1.83) formatted alignment done with PROMALS3D


SctN_Salt  ----------------MKNEL---------------------------------------
SctN_EHEC  MISEHDSVLEKYPRIQKVLNST--------------------------------------
SctN_Chrm  ---------MRLPDIRLIENTL--------------------------------------
SctN_Yers  ---------MKLPDIARLTPRL--------------------------------------
SctN_Soda  ----------MTCNSQRLASML--------------------------------------
SctN_Laws  ----------------MALEYI--------------------------------------
SctN_Chl4  ----------------MEEITTE-------------------------------------


SctN_Salt  --------------------------MQRLRLKYPPP---------DGYCR--------W
SctN_EHEC  --------------------------VPALSLN-------------SSTRY--------E
SctN_Chrm  --------------------------RERLTLAPA---PPGQR---SGVEL--------F
SctN_Yers  --------------------------QQQLTRPSAPP---------EGLRY--------R
SctN_Soda  --------------------------AQHLTPVDEPP---------DGYRL--------T
SctN_Laws  --------------------------ASLLEEAVQNT---------SPVEV--------R
SctN_Chl4  --------------------------FNTLMTELPDV---------QLTAV--------V


===================================

...the difference is that the first blank line after the block must be 
spaces (or consensus characters *:. etc.), not just a blank line.

Thanks for the hints!
Nick




Peter wrote:
> On Tue, Aug 5, 2008 at 10:39 PM, Nick Matzke <matzke at berkeley.edu> wrote:
>> Thanks for the help Peter, it really is a great tutorial!
>>
>> I've replaced just the ClustalIO.py file as you suggested, and it parses
>> both the example.aln and protein.aln files.
> 
> Good :)
> 
>> However I tried an ClustalW-formatted alignment file I made awhile ago with
>> my own data and still got the star_info error:
>>
>> AttributeError: Alignment instance has no attribute '_star_info'
>>
>> But my file could be weird.  Does the _star_info error indicate alphabet
>> issues or something?
> 
> The _star_info is a nasty private variable used to store the ClustalW
> consensus, used if writing the file back out again in clustal format.
> The error suggests something else has gone wrong with the consensus
> parsing... (and shouldn't be anything to do with the alphabet).
> 
> Could you file a bug, and (after filing the bug) could you upload one
> of these example files to the bug as an attachment please?
> 
> Peter
> 

-- 
====================================================
Nicholas J. Matzke
Ph.D. student, Graduate Student Researcher
Huelsenbeck Lab
4151 VLSB (Valley Life Sciences Building)
Department of Integrative Biology
University of California, Berkeley

Lab website: http://ib.berkeley.edu/people/lab_detail.php?lab=54
Dept. personal page: 
http://ib.berkeley.edu/people/students/person_detail.php?person=370
Lab personal page: 
http://fisher.berkeley.edu/~edna/lab_test/members/matzke.html
Lab phone: 510-643-6299
Dept. fax: 510-643-6264
Cell phone: 510-301-0179
Email: matzke at berkeley.edu

Office hours for Bio1B, Spring 2008: Biology: Plants, Evolution, Ecology
VLSB 2013, Monday 1-1:30 (some TA there for all hours during work week)

Mailing address:
Department of Integrative Biology
3060 VLSB #3140
Berkeley, CA 94720-3140
====================================================



More information about the Biopython mailing list