[Biopython-dev] Slight modifcation to BlastXML parser for AB-BLAST input
Wibowo Arindrarto
bow at bow.web.id
Thu Dec 13 16:14:27 UTC 2012
Hi Colin,
> From what I have seen, the version value is formatted
> differently based on the edition of AB-BLAST being used: personal,
> commerical etc. As I only use the personal edition, I'm not sure if the
> other versions are different but I imagine that they conform to the same
> format, with the version followed by the edition (for example, 3.0PE-AB for
> personal edition). The regex I sent you will keep the edition so I imagine
> it will work on other versions of AB-BLAST as long as the edition is
> represented by "words-words"
Ok then. The regex looks good. You can probably make it more
reader-friendly by separating the regex for NCBI and AB BLAST (e.g.
r'(?:ncbi_blast_regex)|(?:ab_blast_regex)'. But even without this, it
seems to work ok.
> I'll submit a pull request as well and submit the revised regex. If you are
> interested, there are a couple other differences in the XML output between
> AB-BLAST and NCBI-BLAST. I can send you an example output if you would like
> to have a look at it. Presently, SearchIO can't parse AB-BLAST XML output
> for multiple queries as the AB-BLAST output is just a concatentation of
> multiple single queries. Each query contains the <?xml version ...> section
> at the beginning and causes ElementTree to error during iteration. To get
> around this I have been piping the AB-BLAST output and parsing it into a
> more NCBI-BLAST form.
Hmm..it is a problem if AB-BLAST concatenates outputs like that. It
makes the XML
invalid, though, so I'm not sure if we should change the parser to
tolerate this. What are the other differences?
As for the example files, they would indeed be useful for unit testing
(as long as they're not that big ~ less than 50K?). You can send them
to me. If you're feeling it, you can also write your own unit tests
using them :).
Looking forward to the pull request :),
Bow
More information about the Biopython-dev
mailing list