[Biopython-dev] Re: [BioXML-dev] XSLT and Martel output
Bradley Marshall
bradmars at yahoo.com
Fri Sep 1 13:09:47 EDT 2000
It looks great, Andrew.
I haven't crunched any numbers, but my gut feeling is
that xt (from jclark.com) is prob. 5-10 fold faster
than 4XSLT. Unfortunately, 4XSLT is the only python
xslt processor that I know of. It's good, but slow.
On the plus side, xt works quite nice in jpython.
Brad
--- Andrew Dalke <dalke at acm.org> wrote:
> Hello,
>
> With some pointers from Brad I managed to get an
> XSLT converter for
> the Martel SWISS-PROT output into FASTA. I would
> have tried an XML
> one, but wasn't sure which to use.
>
> The input was the example output file I have at
>
http://www.biopython.org/~dalke/Martel/BOSC2000.poster/sample.xml.txt
> This has 8 records and is about 60K long.
>
> The XSLT engine I used is 4XSLT from ForeThought.
> BTW, it was
> entirely too complicated to install esp. since there
> aren't any
> instructions and there seems to be a missing file
> from one of
> the distributions (but which is in the other). :(
>
> The actual XSLT text I used is
>
> <?xml version="1.0"?>
> <xsl:stylesheet
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> version="1.0">
>
> <xsl:template match="//swissprot38_record">
> <xsl:text
> disable-output-escaping="yes">>sp|</xsl:text>
> <xsl:value-of select="*/ac_number"/>
> <xsl:text
> disable-output-escaping="yes">|</xsl:text>
> <xsl:value-of select="*/entry_name"/>
> <xsl:for-each select="DE_block/DE/description">
> <xsl:text> </xsl:text>
> <xsl:value-of select="."/>
> </xsl:for-each>
> <xsl:text>
</xsl:text>
> <xsl:for-each
>
select="sequence_block/SQ_data_block/SQ_data/sequence">
> <xsl:value-of select="translate(., ' ', '')"/>
> <xsl:text>
</xsl:text>
> </xsl:for-each>
> <xsl:if test="position()!=last()">
> <xsl:text>
</xsl:text>
> </xsl:if>
> </xsl:template>
>
> </xsl:stylesheet>
>
> Example output looks like:
> ====
> >sp|Q43495|108_LYCES PROTEIN 108 PRECURSOR.
>
MASVKSSSSSSSSSFISLLLLILLVIVLQSQVIECQPQQSCTASLTGLNVCAPFLVPGSP
> TASTECCNAVQSINHDCMCNTMRIAAQIPAQCNLPPLSCSAN
>
> >sp|P18646|10KD_VIGUN 10 KD PROTEIN PRECURSOR (CLONE
> PSAS10).
>
MEKKSIAGLCFLFLVLFVAQEVVVQSEAKTCENLVDTYRGPCFTTGSCDDHCKNKEHLLS
> GRCRDDVRCWCTRNC
> ====
>
> It took about 3.5 seconds to load the file into the
> DOM and about 1.5
> seconds to process it. Since there are 80,000
> records in sprot38, it
> would take nearly 14 hours to convert everything.
> It would take about
> 20 minutes to translated it using a SAX-based
> converter, so a factor
> of 70 slower.
>
> Of course, it would also require that I have enough
> memory since the
> DOM I'm using (4DOM, also from ForeThought) keeps
> everything in
> RAM.
>
> There are some performance things you need to learn
> using XSLT (or at
> least tricks specific to this engine.) For example
> <xsl:for-each
>
select="sequence_block/SQ_data_block/SQ_data/sequence">
> is a lot faster (20-fold or so!) than
> <xsl:for-each select="*//sequence">
>
> It's a good thing that FASTA doesn't mandate that
> all sequence lines
> (excepting the last) must be 65 characters long.
> The SWISS-PROT
> sequence lines are 60 characters long, and I can't
> figure out how to
> wrap them to different lengths.
>
>
> On the other hand, it *does* work, and the
> performance of the engines
> should go up over time (eg, there is usually about a
> factor of 5-10 by
> translation into C). Plus, in theory you should be
> able to make it
> work with other XSLT tools. Anyone want to try it
> with XT, or one of
> the browsers (does Mozilla or Opera support XSLT?).
>
> Better yet, want to start playing around with the
> BLAST output from
> Martel? :)
>
> Andrew
> dalke at acm.org
> _______________________________________________
> BioXML-dev mailing list - BioXML-dev at bioxml.org
> http://bioxml.org/mailman/listinfo/bioxml-dev
__________________________________________________
Do You Yahoo!?
Yahoo! Mail - Free email you can access from anywhere!
http://mail.yahoo.com/
More information about the Biopython-dev
mailing list