[Biopython-dev] Adding overall coverage to blast-xml

John T jaytee00 at gmail.com
Wed Sep 13 10:09:54 UTC 2017

Hi Bow & Peter,

That seems fair enough, I'll have a look if `HSP.hit_span_all` and
`HSP.query_span_all` are the same as qcov* and add my thing to the cook
book if they aren't.


On 12 September 2017 at 11:12, Wibowo Arindrarto <bow at bow.web.id> wrote:

> Hi John, Peter,
> Thanks for looking into this. Indeed the `qcov` and `qcovhsp` values are
> already captured when parsing `blast-tab` formats with SearchIO. They are
> accessible via the `Hit.query_coverage` and `HSP.query_coverage` attributes
> respectively.
> For non-`blast-tab` formats such as `blast-xml`, there may already be a
> similar attribute you can use in SearchIO. They are the `HSP.hit_span_all`
> and `HSP.query_span_all` attributes, which calculates how many bases are
> covered by the hit and query, respectively. You can find more about them
> here: http://biopython.org/DIST/docs/api/Bio.SearchIO._model.
> hsp.HSP-class.html. However I have to say I don't recall if these values
> are always the same as the qcov values BLAST outputs when using the
> `blast-tab` format. Maybe that is worth checking first.
> That being said, if they do not end up being always equal (i.e. there are
> corner cases in BLAST that those attributes do not properly capture), I
> also am quite hesitant to add more derived values for specific file
> formats. The main reason is I feel this would make the current object model
> more brittle. We are already quite brittle in some respects, because of the
> fact that the formats we are parsing do not have precise specifications.
> I am very open to having this put in our cookbook (
> http://biopython.org/wiki/Category%3ACookbook), however, since this would
> be more akin to a BLAST-specific recipe that many would find useful.
> Let me know what you think.
> Cheers,
> Bow
> -------- Original Message --------
> Subject: Re: [Biopython-dev] Adding overall coverage to blast-xml
> Local Time: September 12, 2017 11:28 AM
> UTC Time: September 12, 2017 9:28 AM
> From: p.j.a.cock at googlemail.com
> To: John T <jaytee00 at gmail.com>
> bow at bow.web.id <bow at bow.web.id>, Biopython-Dev Mailing List <
> biopython-dev at mailman.open-bio.org>
> Hi John,
> At least if its qcovs (or qcovhsp) there is a clear base value
> from the NCBI code for guidance and conformance testing.
> I'm still a little uncomfortable with a parser inferring to many
> advanced values like this, but will defer to Bow as the
> SearchIO author to comment on this might fit into his
> object model (e.g. does this have parallels in the other
> supported formats?).
> (I'm pretty sure this general topic has come up in the past,
> but couldn't immediately find an old issue or email thread.)
> Bow?
> Peter
> On Mon, Sep 11, 2017 at 10:02 PM, John T <jaytee00 at gmail.com> wrote:
>> It is indeed qcovs, I was stuck using XML, didn't investigate that
>> before. I haven't looked into the SearchIO version yet, but I assume it
>> won't be more difficult to implement there so I'm happy to commit to that.
>> John
>> On 11 September 2017 at 21:14, Peter Cock <p.j.a.cock at googlemail.com>
>> wrote:
>>> Hi John, CC Bow,
>>> Is this information in any of the BLAST output formats?
>>> I suspect you are talking about recreating one or more
>>> of the coverage fields which can be requested in the
>>> BLAST csv or tabular output files formats? i.e. qcovs
>>> or qcovhsp perhaps?
>>> If so, I can see a rational for computing this for the
>>> BLAST XML parsing... although given we'd like to
>>> push people towards SearchIO (which supports the
>>> BLAST tabular format as well), doing it there might
>>> make more sense than in NCBIXML?
>>> Peter
>>> On Mon, Sep 11, 2017 at 5:40 PM, John T <jaytee00 at gmail.com> wrote:
>>> > Hi all,
>>> >
>>> > I've written a script that calculates the overall coverage for all HSP
>>> in an
>>> > alignment, like that given on the BLAST homepage. Would it make sense
>>> to add
>>> > this to Bio.Blast.Record.Alignment (and/or the BlastIO equivalent)?
>>> It's not
>>> > something that's explicitly defined in the XML and so may be out of
>>> scope,
>>> > but it is something that's given in the website version of BLAST
>>> results so
>>> > people might reasonably expect it to be there. I'd add it as a
>>> @property so
>>> > it as only calculated when required.
>>> >
>>> > I've never tried contributing to an open source project before so I
>>> didn't
>>> > want to dive in and do something stupid.
>>> >
>>> > Cheers,
>>> > John
>>> >
>>> >
>>> > _______________________________________________
>>> > Biopython-dev mailing list
>>> > Biopython-dev at mailman.open-bio.org
>>> > http://mailman.open-bio.org/mailman/listinfo/biopython-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython-dev/attachments/20170913/4b0865c7/attachment.html>

More information about the Biopython-dev mailing list