From redmine at redmine.open-bio.org Thu Dec 1 12:08:47 2016 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 01 Dec 2016 12:08:47 +0000 Subject: [Biopython-dev] [Biopython (old issues only) - Bug #3074] (Migrated) Please support additional fields in the SeqIO embl parser References: Message-ID: Issue #3074 has been updated by Peter Cock. Description updated Status changed from New to Migrated While the DR and KW lines have been done, the DT lines are still currently ignored. Filed as a new issue on GitHub (since we are retiring the RedMine issue tracker), see: https://github.com/biopython/biopython/issues/1016 ---------------------------------------- Bug #3074: Please support additional fields in the SeqIO embl parser https://redmine.open-bio.org/issues/3074#change-15385 * Author: Wim De * Status: Migrated * Priority: Normal * Assignee: Biopython Dev Mailing List * Category: Main Distribution * Target version: 1.53 * URL: ---------------------------------------- Sequences returned from the Bio.SeqIO parser for 'embl' files don't contain a parsed version of at least the following fields: DT (date) DR (database cross references) Possibly also missing: KW the keywords field dataclass field in the ID field It would be useful to me and I imagine others to have access to these additional fields that are in the original embl files. Not having them means that if you parse embl files, manipulate the sequence and write out the result means losing data or having to manually add the original data back into the file. If you wish to hold on to this data. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From redmine at redmine.open-bio.org Thu Dec 1 12:14:30 2016 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 01 Dec 2016 12:14:30 +0000 Subject: [Biopython-dev] [Biopython (old issues only) - Bug #3060] (Migrated) Add ungap method to the SeqRecord? References: Message-ID: Issue #3060 has been updated by Peter Cock. Description updated Status changed from New to Migrated URL set to https://github.com/biopython/biopython/issues/1017 Moved to GitHub as https://github.com/biopython/biopython/issues/1017 ---------------------------------------- Bug #3060: Add ungap method to the SeqRecord? https://redmine.open-bio.org/issues/3060#change-15387 * Author: Peter Cock * Status: Migrated * Priority: Normal * Assignee: Biopython Dev Mailing List * Category: Main Distribution * Target version: 1.54b * URL: https://github.com/biopython/biopython/issues/1017 ---------------------------------------- Biopython 1.53 added an ungap method to the Seq object. This is a possible enhancement request to add a matching ungap method to the SeqRecord object, where the per-letter-annotation and features should be adjusted to match. My motivating example is to take an ACE file loaded with SeqIO, remove the gaps, and output the contigs as FASTQ or QUAL files. This requires the per-letter-annotation to be sliced to match the ungapped sequence. Likewise any features fully contained within ungapped regions should be retained and their co-ordinates shifted. I'm not sure if we should do anything about features spanning a gap - the simple option which I have implemented is they are lost. This is done via the existing SeqRecord slicing and addition code. Patch to follow... See also Bug 3054 for adding upper and lower methods to the SeqRecord, and the broader discussion on Bug 2351 about strings, Seq and SeqRecord objects. ---Files-------------------------------- seqrecord_ungap.patch (4.74 KB) -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From redmine at redmine.open-bio.org Thu Dec 1 12:16:38 2016 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 01 Dec 2016 12:16:38 +0000 Subject: [Biopython-dev] [Biopython (old issues only) - Bug #2818] (Resolved) Add start and end properties to SeqFeature object References: Message-ID: Issue #2818 has been updated by Peter Cock. Status changed from New to Resolved % Done changed from 0 to 100 Resolved with changes to Biopython since, including the new CompoundLocation object for joins, and making the position objects more integer like. ---------------------------------------- Bug #2818: Add start and end properties to SeqFeature object https://redmine.open-bio.org/issues/2818#change-15388 * Author: Peter Cock * Status: Resolved * Priority: Normal * Assignee: Biopython Dev Mailing List * Category: Main Distribution * Target version: Not Applicable * URL: ---------------------------------------- An enhancment proposed on the mailing list would add start and end properties to the SeqFeature returning plain integers (non-fuzzy approximations to the start and end locations) suitable for slicing most parent sequences. Dealing with a join location would still be tricky. Example usage: >>> from Bio import SeqIO >>> record = SeqIO.read(open("NC_005816.gb"),"gb") >>> feature = record.features[2] >>> print feature type: gene location: [86:1109] ref: None:None strand: 1 qualifiers: Key: db_xref, Value: ['GeneID:2767718'] Key: locus_tag, Value: ['YP_pPCP01'] >>> record[feature.start:feature.end] SeqRecord(seq=Seq('ATGGTCACTTTTGAGACAGTTATGGAAATTAAAATCCTGCACAAGCAGGGAATG...TGA', IUPACAmbiguousDNA()), id='NC_005816.1', name='NC_005816', description='Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence.', dbxrefs=[]) >>> record.seq[feature.start:feature.end] Seq('ATGGTCACTTTTGAGACAGTTATGGAAATTAAAATCCTGCACAAGCAGGGAATG...TGA', IUPACAmbiguousDNA()) Patch to follow. ---Files-------------------------------- sf_start_end.patch (519 Bytes) -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.j.a.cock at googlemail.com Mon Dec 5 18:17:34 2016 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 5 Dec 2016 18:17:34 +0000 Subject: [Biopython-dev] Planning to drop Python 2 support by 2020? Message-ID: Dear fellow Biopythoneers, Or next release (Biopython 1.69) drops Python 2.6 support, and will target Python 2.7 and Python 3.3 onwards. While Python 2.7 support will continue in the short to medium term, the Python team themselves currently plan to stop support by 2020. That is only three or four more years, and seems a sensible upper limit for how long Biopython continues to support Python 2.7. Furthermore, NumPy (which a lot of our code depends on) and other high profile relevant projects also intend to drop their Python 2.7 support by 2020 as advertised on this campaign site: http://www.python3statement.org/ Does anyone object to adopting this goal for Biopython, and adding the project to http://www.python3statement.org/ ? Thanks, Peter From popantrop at gmail.com Mon Dec 12 12:46:16 2016 From: popantrop at gmail.com (Adam Kurkiewicz) Date: Mon, 12 Dec 2016 12:46:16 +0000 Subject: [Biopython-dev] Status & scope of the git pre-commit hook? Message-ID: What is the status and the scope of the git pre-commit hook in biopython? I found an issue raised here, but I can't tell if this is work in progress or something accomplished: https://github.com/biopython/biopython/issues/883. I couldn't find the hook in the project files. Ideally, what I'm looking for is a code snippet, which will: - run test cases for the files I changed since the last commit (not all the files -- too slow). - run pep8 on the files I changed since the last commit (not all the files -- too slow). - run itself automatically before committing, but also allow me to run it manually. The motivation is to cut down on the time spent waiting for test cases to pass every time I do some work on the codebase, and to prevent me from committing invalid code. I've eventually (poorly) written one myself (based on a snippet passed by Peter), which is suitable for what I'm doing with biopython just now: #!/bin/sh set -xe unset PYTHONPATH # these are hard coded, which isn't nice: python2 Tests/run_tests.py test_Affy test_CelFile Bio.Affy Bio.Affy.CelFile python3 Tests/run_tests.py test_Affy test_CelFile Bio.Affy Bio.Affy.CelFile pypy Tests/run_tests.py test_Affy test_CelFile Bio.Affy Bio.Affy.CelFile # these should not be so generic (just check the files I changed, not all the files!) # Checking all the files is very slow (20+ seconds on my system) pep8 --max-line-length 92 BioSQL/ pep8 --ignore E402 --max-line-length 90 Scripts/ pep8 --max-line-length 90 Doc/examples/ pep8 --ignore E122,E123,E126,E127,E128,E129,E501,E731 Bio/ pep8 --ignore E122,E123,E126,E127,E128,E241,E402,E501,E731 Tests/ Before writing this script I've tried installing tox locally, but that didn't go very well and I gave up. Also, it doesn't seem that running tox would achieve what I want -- a lightweight way to run relevant testcases and lint relevant files. I've also tried running a full test suite, but it takes too much time to be run regularly. Adam From p.j.a.cock at googlemail.com Thu Dec 15 16:19:31 2016 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 15 Dec 2016 16:19:31 +0000 Subject: [Biopython-dev] Status & scope of the git pre-commit hook? In-Reply-To: References: Message-ID: Hi Adam, TL;DR - we have some gaps in our developer facing documentation, but most of what you want is possible. On Mon, Dec 12, 2016 at 12:46 PM, Adam Kurkiewicz wrote: > What is the status and the scope of the git pre-commit > hook in biopython? It works as described on https://github.com/biopython/biopython/issues/493 I like it and would like to encourage all our developers and contributors to use it - but since you can't add this directly to the git repository settings, it has to be a documentation issue where we tell people to do it. Since turning on PEP8 checking with TravisCI this is more important, thus issue 883. > I found an issue raised here, but I can't tell if this is work in > progress or something accomplished: > https://github.com/biopython/biopython/issues/883. That's a wider issue about the TravisCI/Tox/PEP8 linting settings and how to make them more accessible (e.g. for your exact use case). > I couldn't find the hook in the project files. See https://github.com/biopython/biopython/issues/493 > Ideally, what I'm looking for is a code snippet, which will: > > - run test cases for the files I changed since the last commit (not > all the files -- too slow). Nearly impossible? The internal dependencies of Biopython would make that quite a challenge. But perhaps someone has already solved this my inspecting the imports of each test file, and the import dependencies themselves? > - run pep8 on the files I changed since the last commit (not all the > files -- too slow). Easy - that's what the recommend git pre-commit hook does. > - run itself automatically before committing, but also allow me to run > it manually. Running the hook script explicitly would do that, e.g. .git/hooks/pre-commit > The motivation is to cut down on the time spent waiting for test cases > to pass every time I do some work on the codebase, and to prevent me > from committing invalid code. Exactly :) > I've eventually (poorly) written one myself (based on a snippet passed > by Peter), which is suitable for what I'm doing with biopython just > now: > > #!/bin/sh > set -xe > unset PYTHONPATH > > # these are hard coded, which isn't nice: > python2 Tests/run_tests.py test_Affy test_CelFile Bio.Affy Bio.Affy.CelFile > python3 Tests/run_tests.py test_Affy test_CelFile Bio.Affy Bio.Affy.CelFile > pypy Tests/run_tests.py test_Affy test_CelFile Bio.Affy Bio.Affy.CelFile > > # these should not be so generic (just check the files I changed, > not all the files!) > # Checking all the files is very slow (20+ seconds on my system) > pep8 --max-line-length 92 BioSQL/ > pep8 --ignore E402 --max-line-length 90 Scripts/ > pep8 --max-line-length 90 Doc/examples/ > pep8 --ignore E122,E123,E126,E127,E128,E129,E501,E731 Bio/ > pep8 --ignore E122,E123,E126,E127,E128,E241,E402,E501,E731 Tests/ > > Before writing this script I've tried installing tox locally, but that > didn't go very well and I gave up. Also, it doesn't seem that running > tox would achieve what I want -- a lightweight way to run relevant > testcases and lint relevant files. Tox does try to solve most of what you're doing. but would need to be combined with the magic that the git-commit hook uses to know which files have changed for linting. That is possible with clever configuration of the commands we tell tox to run (using the same approach as the hook uses). (As noted above, running only tests of interest is very hard) > I've also tried running a full test suite, but it takes too much time > to be run regularly. > > Adam What I do is work on feature branches, using the pre-commit hook, and test at least the modules I expect to be affected locally. Then I push to GitHub and because I have TravisCI setup on my personal account too, and that runs the (offline) full test suite on multiple versions of Python. It does take a while, but if that works it would work on a pull request (or direct commit to the master branch). Adding a summary of your use case to #883 would be great, as would pull requests to the relevant documentation: https://github.com/biopython/biopython/blob/master/Doc/Tutorial/chapter_testing.tex https://github.com/biopython/biopython.github.io/blob/master/wiki/Contributing.md I'm on leave at the moment (and ought to be asleep right now), so I won't tackle this myself immediately, but I am still trying to review pull requests. Thanks! Peter