[Biojava-l] (no subject)
Bruno Aranda - Dev
bruno_dev at ebiointel.com
Wed Jul 21 02:52:39 EDT 2004
Hi Alexandre,
To parse the ClustalW results I use a SequenceAlignmentSAXParser and a
custom implementation of DefaultHandler which I call
'SequenceAlignmentContentHandler'.
The code for the custom DefaultHandler class is:
public final class SequenceCollectionContentHandler extends DefaultHandler {
private final Map sequenceMap;
private final Alphabet alphabet;
private String currentSeqName;
private String currentSeq;
/**
* Creates a new <code>SequenceAlignmentContentHandler</code> instance.
*
* @param map
* The map to be filled with sequences
* @param alphabet
* The alphabet to be used
*/
public SequenceCollectionContentHandler(Map map, Alphabet alphabet) {
this.sequenceMap = map;
this.alphabet = alphabet;
}
// This method is called when an element is encountered
public final void startElement(String namespaceURI, String localName,
String qName, Attributes atts) {
if (localName.equals("Sequence")) {
startCurrentSequence(atts);
}
}
/*
* (non-Javadoc)
*
* @see org.xml.sax.ContentHandler#characters(char[], int, int)
*/
public final void characters(char[] ch, int start, int length)
throws SAXException {
String content = new String(ch, start, length);
this.currentSeq = content;
}
/*
* (non-Javadoc)
*
* @see org.xml.sax.ContentHandler#endElement(java.lang.String,
* java.lang.String, java.lang.String)
*/
public final void endElement(String uri, String localName, String qName)
throws SAXException {
if (localName.equals("Sequence")) {
endCurrentSequence();
}
}
private void startCurrentSequence(Attributes atts) {
String attName = atts.getLocalName(0);
if (attName.equals("sequenceName")) {
this.currentSeqName = atts.getValue(0);
}
}
private void endCurrentSequence() {
if (this.alphabet.equals(DNATools.getDNA())) {
try {
Sequence seq = DNATools.createDNASequence(currentSeq,
currentSeqName);
this.sequenceMap.put(currentSeqName, seq);
} catch (IllegalSymbolException e) {
System.err.println(this.getClass()
+ " - IllegalSymbolException: " + e.getMessage());
}
} else if (this.alphabet.equals(RNATools.getRNA())) {
try {
Sequence seq = RNATools.createRNASequence(currentSeq,
currentSeqName);
this.sequenceMap.put(currentSeqName, seq);
} catch (IllegalSymbolException e) {
System.err.println(this.getClass()
+ " - IllegalSymbolException: " + e.getMessage());
}
} else if (this.alphabet.equals(ProteinTools.getAlphabet())) {
try {
Sequence seq = ProteinTools.createProteinSequence(currentSeq,
currentSeqName);
this.sequenceMap.put(currentSeqName, seq);
} catch (IllegalSymbolException e) {
System.err.println(this.getClass()
+ " - IllegalSymbolException: " + e.getMessage());
}
}
}
}
Then, the code to use the SequenceAlignmentSAXParser and the handler could
be:
// copy and paste from here
File alnFile = new File("/yout/aln/file"); // put here the path to the
aln output file from the clustal
Alphabet alphabet = ...; // put here the alphabet to be use (eg.
DNATools.getDNA());
Map seqMap = new HashMap(); // this map will be fill by the sequences
from the alignment
SequenceAlignmentSAXParser parser = new SequenceAlignmentSAXParser();
ContentHandler handler = new SequenceCollectionContentHandler(
seqMap, alphabet);
try {
BufferedReader contents = new BufferedReader(new InputStreamReader(
alnStream));
parser.setContentHandler(handler);
parser.parse(new InputSource(contents));
} catch (FileNotFoundException fnfe) {
System.out.println(fnfe.getMessage());
System.out.println("Couldn't open file");
} catch (IOException ioe) {
ioe.printStackTrace();
} catch (SAXException se) {
System.err.println(se.getMessage());
se.printStackTrace();
}
// Finally I create the alignment object using the Map
Alignment alignment = new SimpleAlignment(seqMap);
// end of copy
So you have an Alignment instance which contains all the sequences in the
alignment. I know there are better aproximations, but this one works for
me... If you have any doubt, don't hesitate to ask again!
Cheers,
Bruno
More information about the Biojava-l
mailing list