[Biopython-dev] Re: [BioXML-dev] XSLT and Martel output

Bradley Marshall bradmars at yahoo.com
Tue Sep 5 14:41:09 EDT 2000


So I went back and checked the python xml sig mailing
list, and fourthought claims that 4XSLT 0.9.2 is up to
100 times faster thann 0.8.2.  However, it wasn't
available from their web site.  There was a link to
the rpms, though, and there I found 4XSLT 0.9.2.  So,
if anybody wants it, it's at :

ftp://fourthought.com/pub/mirrors/python4linux/redhat/i386/4XSLT-0.9.2-1.i386.rpm


Brad


--- Bradley Marshall <bradmars at yahoo.com> wrote:
> 
> It looks great, Andrew.
> 
> I haven't crunched any numbers, but my gut feeling
> is
> that xt (from jclark.com) is prob. 5-10 fold faster
> than 4XSLT.  Unfortunately, 4XSLT is the only python
> xslt processor that I know of.  It's good, but slow.
> 
> On the plus side, xt works quite nice in jpython.
> 
> Brad
> 
> --- Andrew Dalke <dalke at acm.org> wrote:
> > Hello,
> >
> > With some pointers from Brad I managed to get an
> > XSLT converter for
> > the Martel SWISS-PROT output into FASTA.  I would
> > have tried an XML
> > one, but wasn't sure which to use.
> >
> > The input was the example output file I have at
> >
>
http://www.biopython.org/~dalke/Martel/BOSC2000.poster/sample.xml.txt
> > This has 8 records and is about 60K long.
> >
> > The XSLT engine I used is 4XSLT from ForeThought.
> > BTW, it was
> > entirely too complicated to install esp. since
> there
> > aren't any
> > instructions and there seems to be a missing file
> > from one of
> > the distributions (but which is in the other).  :(
> >
> > The actual XSLT text I used is
> >
> > <?xml version="1.0"?>
> > <xsl:stylesheet
> > xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> > version="1.0">
> >
> > <xsl:template match="//swissprot38_record">
> > <xsl:text
> > disable-output-escaping="yes">&gt;sp|</xsl:text>
> > <xsl:value-of select="*/ac_number"/>
> > <xsl:text
> > disable-output-escaping="yes">|</xsl:text>
> > <xsl:value-of select="*/entry_name"/>
> > <xsl:for-each select="DE_block/DE/description">
> > <xsl:text> </xsl:text>
> > <xsl:value-of select="."/>
> > </xsl:for-each>
> > <xsl:text>
> </xsl:text>
> > <xsl:for-each
> >
>
select="sequence_block/SQ_data_block/SQ_data/sequence">
> > <xsl:value-of select="translate(., ' ', '')"/>
> > <xsl:text>
> </xsl:text>
> > </xsl:for-each>
> > <xsl:if test="position()!=last()">
> > <xsl:text>
> </xsl:text>
> > </xsl:if>
> > </xsl:template>
> >
> > </xsl:stylesheet>
> >
> > Example output looks like:
> > ====
> > >sp|Q43495|108_LYCES PROTEIN 108 PRECURSOR.
> >
>
MASVKSSSSSSSSSFISLLLLILLVIVLQSQVIECQPQQSCTASLTGLNVCAPFLVPGSP
> > TASTECCNAVQSINHDCMCNTMRIAAQIPAQCNLPPLSCSAN
> >
> > >sp|P18646|10KD_VIGUN 10 KD PROTEIN PRECURSOR
> (CLONE
> > PSAS10).
> >
>
MEKKSIAGLCFLFLVLFVAQEVVVQSEAKTCENLVDTYRGPCFTTGSCDDHCKNKEHLLS
> > GRCRDDVRCWCTRNC
> > ====
> >
> > It took about 3.5 seconds to load the file into
> the
> > DOM and about 1.5
> > seconds to process it.  Since there are 80,000
> > records in sprot38, it
> > would take nearly 14 hours to convert everything.
> > It would take about
> > 20 minutes to translated it using a SAX-based
> > converter, so a factor
> > of 70 slower.
> >
> > Of course, it would also require that I have
> enough
> > memory since the
> > DOM I'm using (4DOM, also from ForeThought) keeps
> > everything in
> > RAM.
> >
> > There are some performance things you need to
> learn
> > using XSLT (or at
> > least tricks specific to this engine.)  For
> example
> > <xsl:for-each
> >
>
select="sequence_block/SQ_data_block/SQ_data/sequence">
> > is a lot faster (20-fold or so!) than
> > <xsl:for-each select="*//sequence">
> >
> > It's a good thing that FASTA doesn't mandate that
> > all sequence lines
> > (excepting the last) must be 65 characters long.
> > The SWISS-PROT
> > sequence lines are 60 characters long, and I can't
> > figure out how to
> > wrap them to different lengths.
> >
> >
> > On the other hand, it *does* work, and the
> > performance of the engines
> > should go up over time (eg, there is usually about
> a
> > factor of 5-10 by
> > translation into C).  Plus, in theory you should
> be
> > able to make it
> > work with other XSLT tools.  Anyone want to try it
> > with XT, or one of
> > the browsers (does Mozilla or Opera support
> XSLT?).
> >
> > Better yet, want to start playing around with the
> > BLAST output from
> > Martel?  :)
> >
> > Andrew
> > dalke at acm.org
> > _______________________________________________
> > BioXML-dev mailing list  -  BioXML-dev at bioxml.org
> > http://bioxml.org/mailman/listinfo/bioxml-dev
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Yahoo! Mail - Free email you can access from
> anywhere!
> http://mail.yahoo.com/
> _______________________________________________
> BioXML-dev mailing list  -  BioXML-dev at bioxml.org
> http://bioxml.org/mailman/listinfo/bioxml-dev


__________________________________________________
Do You Yahoo!?
Yahoo! Mail - Free email you can access from anywhere!
http://mail.yahoo.com/



More information about the Biopython-dev mailing list