[Biopython-dev] support for database of BOLDSYSTEMS?

Travis Wrightsman twrig002 at ucr.edu
Wed Dec 10 16:51:09 UTC 2014


It might be best to contact the general list as well to see if anyone has used BOLD before. I visited the website for a few minutes today, it seems to be a data aggregator that offers taxonomic metadata.

-Travis

> On Dec 10, 2014, at 6:31 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> 
> Dear Biopythoneers,
> 
> For those of you not following GitHub's pull request notifications,
> recent Biopython contributor Carlos Peña has submitted his code
> for the BOLD (Barcode of Life Data) System for possible inclusion
> in Biopython (email included below), see:
> https://github.com/biopython/biopython/pull/438
> 
> I'm hoping someone on the list has used BOLD before, see
> http://www.boldsystems.org/ - and could give some feedback
> please?
> 
> Or should we need to ask on the main mailing list?
> 
> Thanks,
> 
> Peter
> 
> ---------- Forwarded message ----------
> From: Carlos Peña <notifications at github.com>
> Date: Wed, Dec 3, 2014 at 2:48 PM
> Subject: [biopython] Proposal of new Biopython module: bold (#438)
> To: biopython/biopython <biopython at noreply.github.com>
> 
> 
> As I mentioned in an email to the dev list some time ago, I have been
> working on module to perform calls to the BOLD database via their API.
> The BOLD database contains more than 1 million public DNA barcode
> sequences (part of the COI gene). One of the most interesting services
> is the possibility of sending the barcode sequence and retrieving the
> taxon identification and more metadata from the BOLD servers.
> 
> I just migrated the code to Biopython from a temporal Github
> repository. You can see the documentation here
> https://bold.readthedocs.org/en/latest/usage.html that covers all the
> API methods provided by BOLD.
> 
> This module includes unittests for 99% coverage. The tests and
> docstrings have been tested in Python 2.6, 2.7, 3.3, 3.4 and pypy.
> 
> I completed all the work that I could think of, hence the pull
> request. I am open to feedback on this.
> 
> ________________________________
> 
> You can merge this Pull Request by running
> 
>  git pull https://github.com/carlosp420/biopython patch-30
> 
> Or view, comment on, or merge it at:
> 
>  https://github.com/biopython/biopython/pull/438
> 
> Commit Summary
> 
> copy code in Biopython
> added Experimental Warning
> added tests
> 
> File Changes
> 
> A Bio/bold/__init__.py (33)
> A Bio/bold/api.py (684)
> A Bio/bold/utils.py (32)
> A Tests/test_bold_api.py (261)
> A Tests/test_bold_utils.py (40)
> M setup.py (1)
> 
> Patch Links:
> 
> https://github.com/biopython/biopython/pull/438.patch
> https://github.com/biopython/biopython/pull/438.diff
> 
>> Reply to this email directly or view it on GitHub.
> 
> 
> 
> 
> 
> 
>> On Wed, Nov 5, 2014 at 10:45 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> Hi Carlos,
>> 
>> I've not done anything with Twisted or other asynchronous mechanism
>> for accessing online resources - services like the NCBI discourage
>> submitting multiple requests in parallel anyway.
>> 
>> One idea might be to leave that to the library's user, and focus on the
>> lower level API (building the URLs, parsing the returned values, etc)?
>> 
>> Peter
>> 
>> 
>>> On Tue, Nov 4, 2014 at 8:31 PM, Carlos Peña <mycalesis at gmail.com> wrote:
>>> Hi all,
>>> 
>>> 
>>> I have written an interface to the BOLD database of DNA barcodes. It accepts
>>> FASTA files, sends them to BOLD and gets the specimen identifications to the
>>> species level:
>>> 
>>> https://github.com/carlosp420/bold_retriever
>>> 
>>> I was wondering whether it could be included into BioPython? So far the
>>> packages is a bunch of scripts and I want to make it more robust.
>>> The working version is not so efficient as the running time has exponential
>>> growth (n squared).
>>> 
>>> However, I was able to use asynchronous calls (using Twisted) to make it
>>> faster. The script was able to take (n) seconds for (n) number of sequences.
>>> But I don't fully understand Twisted and the package is unstable.
>>> 
>>> So, I wanted to ask if this little project of mine has any hope of getting
>>> into BioPython. If that is the case I would need some pointers on using
>>> proper classes for the code and fixing the code so that it can be
>>> integrated. I guess I would need to drop Twisted and use instead a standard
>>> Python library for multithreading.
>>> 
>>> I want to improve the package anyways, make it more robust and quick. So I
>>> wanted to ask before giving another chance to Twisted.
>>> 
>>> Any comments would be appreciated,
>>> 
>>> 
>>> carlos
>>> 
>>> 
>>> Dr. Carlos Peña
>>> Laboratory of Genetics
>>> Department of Biology
>>> University of Turku
>>> 20014 Turku
>>> FINLAND
>>> 
>>> 
>>> _______________________________________________
>>> Biopython-dev mailing list
>>> Biopython-dev at mailman.open-bio.org
>>> http://mailman.open-bio.org/mailman/listinfo/biopython-dev
> 
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython-dev



More information about the Biopython-dev mailing list