[Biopython] Intel Python distribution

Alexey Morozov alexeymorozov1991 at gmail.com
Sun Sep 25 03:08:31 UTC 2016


Thank you for wanting to help, Don. I guess I'll throw in my opinion,
although that's just a use case rather than a developer position.

I was using (old) Bio.pairwise2 and eventually found that it's easier and
quicker to just subprocess EMBOSS distmat, even though I had to write my
own parser for their format. That generally covers my experience with
(Bio)python: it's absolutely great for converting sequences, interacting
with APIs, wrapping pipelines, doing lighter analyses and maybe drawing the
results. The simple things like "Rename all the sequences in this fasta
according to the data in that SQL DB" can even be done in interactive
shell, which is undeniably cool. However, I rarely if ever do costly
computation in Biopython, instead calling the tools someone else has
written in C or whatever.

The point: performance of Biopython itself is very rarely a bottleneck. If
you manage to make alignment (pairwise and multiple), statistical analysis
of trees (consensus networks, supertrees, consensus trees and such),
distance calculations and maybe even some search for stuff in sequences
(like HMMs or intron prediction) run so quick I don't have to bother
installing stuff and writing wrappers, that'll be great. Although the good
way to do that is to write a lot of wrappers and bloat distribution with
all the tools, I'm afraid.

2016-09-23 23:03 GMT+08:00 Gunning, Don <don.gunning at intel.com>:

> Peter
>
> Thanks for the reply.
>
> Regarding disk i/o, this is a universal issue.  The main solution
> available is code restructuring and the use of SSD's to hide latency.
>
> Regarding Pairwise2, we are looking for packages that we can include in
> our distribution.  Can anyone advise how widely it is used and Is there an
> need for further performance improvement?
>
> Regards
>
> Don
>
> -----Original Message-----
> From: Peter Cock [mailto:p.j.a.cock at googlemail.com]
> Sent: Thursday, September 01, 2016 5:31 AM
> To: Gunning, Don <don.gunning at intel.com>
> Cc: biopython at biopython.org; Biopython-Dev Mailing List <
> biopython-dev at mailman.open-bio.org>
> Subject: Re: [Biopython] Intel Python distribution
>
> Hi Don,
>
> Biopython covers so many topics that everyone using it probably
> has a different bottleneck. I tend to do basic sequence manipulations
> where disk IO is the main bottleneck - although often problems
> in this area are more on the end user script (e.g. taking advantage
> of Python sets rather than lists for membership checking).
>
> I do know that the old Bio.pairwise2 code was performing poorly
> on larger sequences (this has a C backend), but that has been
> improved with a rewrite in our latest release, Biopython 1.68.
>
> Hopefully some of our community will volunteer to talk about
> where they think Biopython needs some optimisation?
>
> Regards,
>
> Peter
>
> On Wed, Aug 31, 2016 at 2:34 PM, Gunning, Don <don.gunning at intel.com>
> wrote:
> >
> >
> > Intel has just announced the Intel Python distribution.   An open source
> > version with many packages optimized for performance
> >
> >
> >
> > https://software.intel.com/en-us/python-distribution
> >
> >
> >
> > The life sciences market is an area we are trying to help with.  And your
> > project seemed interesting as it is aligned with our thinking.
> >
> >
> >
> > Could someone write me back and discuss how we could contribute to your
> > project and get our distribution more widely used in your community?  One
> > thought is for Intel to optimize and include packages that are regularly
> > used in your community.
> >
> >
> >
> > We look forward to hearing from you and potentially collaborating.
> >
> >
> >
> > Regards
> >
> >
> >
> > Don
> >
> > Don Gunning
> >
> > Software Program Manager
> >
> > Technical computing group
> >
> > Developer Product Division
> >
> > Intel Corporation
> >
> > 1906 Fox Dr
> >
> > Champaign Il 61820
> >
> > 217 403 4213
> >
> >
> >
> >
> > _______________________________________________
> > Biopython mailing list  -  Biopython at mailman.open-bio.org
> > http://mailman.open-bio.org/mailman/listinfo/biopython
>
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython
>



-- 
Alexey Morozov,
LIN SB RAS, bioinformatics group.
Irkutsk, Russia.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20160925/011c505d/attachment.html>


More information about the Biopython mailing list