[Biopython-dev] Slight modifcation to BlastXML parser for AB-BLAST input

Wibowo Arindrarto bow at bow.web.id
Thu Dec 13 04:15:01 UTC 2012


Hi Colin,

Thanks for the report. AB-BLAST wasn't included in the BLAST XML
parser's test suite so I'm glad you spotted this :).

You're proposing a bug fix, so yes, this should be included in our code.
You could submit a pull request on our github page:
https://github.com/biopython/biopython/pulls, or I can submit it on
your behalf if you prefer not to submit it yourself. If you're not
familiar with GitHub, we have a quick guide on how to use it to
develop Biopython here: http://biopython.org/wiki/GitUsage. GitHub's
help on how to submit pull requests is a useful read too:
https://help.github.com/articles/using-pull-requests

Along with the patch, a unit test on the AB-BLAST output would also be very
welcomed.

As for the actual regex change, I was wondering, is that the only
possible pattern of the BlastOutput_version tag in AB-BLAST? Do you
have examples of any other version output from AB-BLAST?

cheers,
Bow


P.S. CC-ed to the Biopython-dev mailing list


On Thu, Dec 13, 2012 at 4:41 AM, Colin Archer <ctnarcher at gmail.com> wrote:
> Hi Bow,
>            I have been using your implementation of the biopython BLAST
> output parser but for AB-BLAST input and it has been working OK so far,
> although I haven't thoroughly had a look at the speed yet. I initially found
> that the version tag (BlastOutput_version) for AB-BLAST results were slighly
> different from NCBI BLAST and changed the regex you implemented to cover
> both versions. The difference between them was:
>
>   <BlastOutput_version>BLASTN 2.2.27+</BlastOutput_version>
>   <BlastOutput_version>3.0PE-AB [2009-10-30] [linux26-x64-I32LPF64
> 2009-11-17T18:52:53]</BlastOutput_version>
>
>
> and the regex I ended up using was:
> r'(\d+\.(?:\d+\.)*\d+)(?:\w+-\w+|\+)?'
>
> and here is the tested output:
>>>> _RE_VERSION1 = re.compile(r'\d+\.\d+\.\d+\+?')
>>>> _RE_VERSION2 = re.compile(r'(\d+\.(?:\d+\.)*\d+)(?:\w+-\w+|\+)?')
>>>> version1
> 'BLASTN 2.2.27+'
>>>> version2
> '3.0PE-AB [2009-10-30] [linux26-x64-I32LPF64 2009-11-17T18:52:53]'
>>>> re.search(_RE_VERSION1, version1).group(0)
> '2.2.27+'
>>>> re.search(_RE_VERSION2, version1).group(0)
> '2.2.27+'
>>>> re.search(_RE_VERSION1, version2).group(0)
> Traceback (most recent call last):
>   File "<input>", line 1, in <module>
> AttributeError: 'NoneType' object has no attribute 'group'
>>>> re.search(_RE_VERSION2, version2).group(0)
> '3.0PE-AB'
>
> Would there be any chance of including this in a future release of
> BioPython?
>
> Thanks
> Colin
>
>



More information about the Biopython-dev mailing list