From redmine at redmine.open-bio.org  Thu Dec  1 12:08:47 2016
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 01 Dec 2016 12:08:47 +0000
Subject: [Biopython-dev] [Biopython (old issues only) - Bug #3074]
 (Migrated) Please support additional fields in the SeqIO embl parser
References: <redmine.issue-3074.20100507104701@redmine.open-bio.org>
Message-ID: <redmine.journal-15385.20161201120846.9a030bc4d4e8da0b@redmine.open-bio.org>

Issue #3074 has been updated by Peter Cock.

Description updated
Status changed from New to Migrated

While the DR and KW lines have been done, the DT lines are still currently ignored. Filed as a new issue on GitHub (since we are retiring the RedMine issue tracker), see:

https://github.com/biopython/biopython/issues/1016

----------------------------------------
Bug #3074: Please support additional fields in the SeqIO embl parser
https://redmine.open-bio.org/issues/3074#change-15385

* Author: Wim De
* Status: Migrated
* Priority: Normal
* Assignee: Biopython Dev Mailing List
* Category: Main Distribution
* Target version: 1.53
* URL: 
----------------------------------------
Sequences returned from the Bio.SeqIO parser for 'embl' files don't contain a parsed version of at least the following fields:
DT (date)
DR (database cross references)

Possibly also missing:
KW the keywords field
dataclass field in the ID field

It would be useful to me and I imagine others to have access to these additional fields that are in the original embl files. Not having them means that if you parse embl files, manipulate the sequence and write out the result means losing data or having to manually add the original data back into the file. If you wish to hold on to this data.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython-dev/attachments/20161201/1e578a07/attachment.html>

From redmine at redmine.open-bio.org  Thu Dec  1 12:14:30 2016
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 01 Dec 2016 12:14:30 +0000
Subject: [Biopython-dev] [Biopython (old issues only) - Bug #3060]
 (Migrated) Add ungap method to the SeqRecord?
References: <redmine.issue-3060.20100421120750@redmine.open-bio.org>
Message-ID: <redmine.journal-15387.20161201121430.1cdbbc22eab11b5a@redmine.open-bio.org>

Issue #3060 has been updated by Peter Cock.

Description updated
Status changed from New to Migrated
URL set to https://github.com/biopython/biopython/issues/1017

Moved to GitHub as https://github.com/biopython/biopython/issues/1017

----------------------------------------
Bug #3060: Add ungap method to the SeqRecord?
https://redmine.open-bio.org/issues/3060#change-15387

* Author: Peter Cock
* Status: Migrated
* Priority: Normal
* Assignee: Biopython Dev Mailing List
* Category: Main Distribution
* Target version: 1.54b
* URL: https://github.com/biopython/biopython/issues/1017
----------------------------------------
Biopython 1.53 added an ungap method to the Seq object.

This is a possible enhancement request to add a matching ungap method to the SeqRecord object, where the per-letter-annotation and features should be adjusted to match.

My motivating example is to take an ACE file loaded with SeqIO, remove the gaps, and output the contigs as FASTQ or QUAL files. This requires the per-letter-annotation to be sliced to match the ungapped sequence.

Likewise any features fully contained within ungapped regions should be retained and their co-ordinates shifted. I'm not sure if we should do anything about features spanning a gap - the simple option which I have implemented is they are lost. This is done via the existing SeqRecord slicing and addition code.

Patch to follow...

See also Bug 3054 for adding upper and lower methods to the SeqRecord, and the broader discussion on Bug 2351 about strings, Seq and SeqRecord objects.

---Files--------------------------------
seqrecord_ungap.patch (4.74 KB)


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython-dev/attachments/20161201/093ff931/attachment.html>

From redmine at redmine.open-bio.org  Thu Dec  1 12:16:38 2016
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 01 Dec 2016 12:16:38 +0000
Subject: [Biopython-dev] [Biopython (old issues only) - Bug #2818]
 (Resolved) Add start and end properties to SeqFeature object
References: <redmine.issue-2818.20090421081304@redmine.open-bio.org>
Message-ID: <redmine.journal-15388.20161201121638.b3e802367307dfe2@redmine.open-bio.org>

Issue #2818 has been updated by Peter Cock.

Status changed from New to Resolved
% Done changed from 0 to 100

Resolved with changes to Biopython since, including the new CompoundLocation object for joins, and making the position objects more integer like.

----------------------------------------
Bug #2818: Add start and end properties to SeqFeature object
https://redmine.open-bio.org/issues/2818#change-15388

* Author: Peter Cock
* Status: Resolved
* Priority: Normal
* Assignee: Biopython Dev Mailing List
* Category: Main Distribution
* Target version: Not Applicable
* URL: 
----------------------------------------
An enhancment proposed on the mailing list would add start and end properties to the SeqFeature returning plain integers (non-fuzzy approximations to the start and end locations) suitable for slicing most parent sequences.  Dealing with a join location would still be tricky.

Example usage:

>>> from Bio import SeqIO
>>> record = SeqIO.read(open("NC_005816.gb"),"gb")
>>> feature = record.features[2]
>>> print feature
type: gene
location: [86:1109]
ref: None:None
strand: 1
qualifiers: 
	Key: db_xref, Value: ['GeneID:2767718']
	Key: locus_tag, Value: ['YP_pPCP01']

>>> record[feature.start:feature.end]
SeqRecord(seq=Seq('ATGGTCACTTTTGAGACAGTTATGGAAATTAAAATCCTGCACAAGCAGGGAATG...TGA', IUPACAmbiguousDNA()), id='NC_005816.1', name='NC_005816', description='Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence.', dbxrefs=[])
>>> record.seq[feature.start:feature.end]
Seq('ATGGTCACTTTTGAGACAGTTATGGAAATTAAAATCCTGCACAAGCAGGGAATG...TGA', IUPACAmbiguousDNA())

Patch to follow.

---Files--------------------------------
sf_start_end.patch (519 Bytes)


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython-dev/attachments/20161201/739e5ff6/attachment.html>

From p.j.a.cock at googlemail.com  Mon Dec  5 18:17:34 2016
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 5 Dec 2016 18:17:34 +0000
Subject: [Biopython-dev] Planning to drop Python 2 support by 2020?
Message-ID: <CAKVJ-_6kN4PuCyTum50TT53CnWNrkjd+w-xo3Nw3+W5AdJGbjQ@mail.gmail.com>

Dear fellow Biopythoneers,

Or next release (Biopython 1.69) drops Python 2.6 support, and will
target Python 2.7 and Python 3.3 onwards.

While Python 2.7 support will continue in the short to medium term,
the Python team themselves currently plan to stop support by 2020.
That is only three or four more years, and seems a sensible upper
limit for how long Biopython continues to support Python 2.7.

Furthermore, NumPy (which a lot of our code depends on) and
other high profile relevant projects also intend to drop their
Python 2.7 support by 2020 as advertised on this campaign
site: http://www.python3statement.org/

Does anyone object to adopting this goal for Biopython, and
adding the project to http://www.python3statement.org/ ?

Thanks,

Peter

From popantrop at gmail.com  Mon Dec 12 12:46:16 2016
From: popantrop at gmail.com (Adam Kurkiewicz)
Date: Mon, 12 Dec 2016 12:46:16 +0000
Subject: [Biopython-dev] Status & scope of the git pre-commit hook?
Message-ID: <CAA2SmNerQN+xvOy2kGB7++T02aXkOpTZ+oOxz70cxrMLevpPdg@mail.gmail.com>

What is the status and the scope of the git pre-commit hook in biopython?

I found an issue raised here, but I can't tell if this is work in
progress or something accomplished:
https://github.com/biopython/biopython/issues/883. I couldn't find the
hook in the project files.

Ideally, what I'm looking for is a code snippet, which will:

- run test cases for the files I changed since the last commit (not
all the files -- too slow).
- run pep8 on the files I changed since the last commit (not all the
files -- too slow).
- run itself automatically before committing, but also allow me to run
it manually.

The motivation is to cut down on the time spent waiting for test cases
to pass every time I do some work on the codebase, and to prevent me
from committing invalid code.

I've eventually (poorly) written one myself (based on a snippet passed
by Peter), which is suitable for what I'm doing with biopython just
now:

    #!/bin/sh
    set -xe
    unset PYTHONPATH

    # these are hard coded, which isn't nice:
    python2 Tests/run_tests.py test_Affy test_CelFile Bio.Affy Bio.Affy.CelFile
    python3 Tests/run_tests.py test_Affy test_CelFile Bio.Affy Bio.Affy.CelFile
    pypy Tests/run_tests.py test_Affy test_CelFile Bio.Affy Bio.Affy.CelFile

    # these should not be so generic (just check the files I changed,
not all the files!)
    # Checking all the files is very slow (20+ seconds on my system)
    pep8 --max-line-length 92 BioSQL/
    pep8 --ignore E402 --max-line-length 90 Scripts/
    pep8 --max-line-length 90 Doc/examples/
    pep8 --ignore E122,E123,E126,E127,E128,E129,E501,E731 Bio/
    pep8 --ignore E122,E123,E126,E127,E128,E241,E402,E501,E731 Tests/

Before writing this script I've tried installing tox locally, but that
didn't go very well and I gave up. Also, it doesn't seem that running
tox would achieve what I want -- a lightweight way to run relevant
testcases and lint relevant files.

I've also tried running a full test suite, but it takes too much time
to be run regularly.

Adam

From p.j.a.cock at googlemail.com  Thu Dec 15 16:19:31 2016
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 15 Dec 2016 16:19:31 +0000
Subject: [Biopython-dev] Status & scope of the git pre-commit hook?
In-Reply-To: <CAA2SmNerQN+xvOy2kGB7++T02aXkOpTZ+oOxz70cxrMLevpPdg@mail.gmail.com>
References: <CAA2SmNerQN+xvOy2kGB7++T02aXkOpTZ+oOxz70cxrMLevpPdg@mail.gmail.com>
Message-ID: <CAKVJ-_7j6Fw1-mNPpoD08_-HcjhW-Xi60fYgLMNQmxsAHXYgYg@mail.gmail.com>

Hi Adam,

TL;DR - we have some gaps in our developer facing
documentation, but most of what you want is possible.

On Mon, Dec 12, 2016 at 12:46 PM, Adam Kurkiewicz <popantrop at gmail.com> wrote:
> What is the status and the scope of the git pre-commit
> hook in biopython?

It works as described on
https://github.com/biopython/biopython/issues/493

I like it and would like to encourage all our developers and
contributors to use it - but since you can't add this directly
to the git repository settings, it has to be a documentation
issue where we tell people to do it.

Since turning on PEP8 checking with TravisCI this is more
important, thus issue 883.

> I found an issue raised here, but I can't tell if this is work in
> progress or something accomplished:
> https://github.com/biopython/biopython/issues/883.

That's a wider issue about the TravisCI/Tox/PEP8 linting
settings and how to make them more accessible (e.g.
for your exact use case).

> I couldn't find the hook in the project files.

See https://github.com/biopython/biopython/issues/493

> Ideally, what I'm looking for is a code snippet, which will:
>
> - run test cases for the files I changed since the last commit (not
> all the files -- too slow).

Nearly impossible? The internal dependencies of Biopython
would make that quite a challenge. But perhaps someone
has already solved this my inspecting the imports of each
test file, and the import dependencies themselves?

> - run pep8 on the files I changed since the last commit (not all the
> files -- too slow).

Easy - that's what the recommend git pre-commit hook does.

> - run itself automatically before committing, but also allow me to run
> it manually.

Running the hook script explicitly would do that, e.g.

.git/hooks/pre-commit

> The motivation is to cut down on the time spent waiting for test cases
> to pass every time I do some work on the codebase, and to prevent me
> from committing invalid code.

Exactly :)

> I've eventually (poorly) written one myself (based on a snippet passed
> by Peter), which is suitable for what I'm doing with biopython just
> now:
>
>     #!/bin/sh
>     set -xe
>     unset PYTHONPATH
>
>     # these are hard coded, which isn't nice:
>     python2 Tests/run_tests.py test_Affy test_CelFile Bio.Affy Bio.Affy.CelFile
>     python3 Tests/run_tests.py test_Affy test_CelFile Bio.Affy Bio.Affy.CelFile
>     pypy Tests/run_tests.py test_Affy test_CelFile Bio.Affy Bio.Affy.CelFile
>
>     # these should not be so generic (just check the files I changed,
> not all the files!)
>     # Checking all the files is very slow (20+ seconds on my system)
>     pep8 --max-line-length 92 BioSQL/
>     pep8 --ignore E402 --max-line-length 90 Scripts/
>     pep8 --max-line-length 90 Doc/examples/
>     pep8 --ignore E122,E123,E126,E127,E128,E129,E501,E731 Bio/
>     pep8 --ignore E122,E123,E126,E127,E128,E241,E402,E501,E731 Tests/
>
> Before writing this script I've tried installing tox locally, but that
> didn't go very well and I gave up. Also, it doesn't seem that running
> tox would achieve what I want -- a lightweight way to run relevant
> testcases and lint relevant files.

Tox does try to solve most of what you're doing. but would need
to be combined with the magic that the git-commit hook uses
to know which files have changed for linting. That is possible
with clever configuration of the commands we tell tox to run
(using the same approach as the hook uses).

(As noted above, running only tests of interest is very hard)

> I've also tried running a full test suite, but it takes too much time
> to be run regularly.
>
> Adam

What I do is work on feature branches, using the pre-commit
hook, and test at least the modules I expect to be affected locally.
Then I push to GitHub and because I have TravisCI setup on
my personal account too, and that runs the (offline) full test
suite on multiple versions of Python. It does take a while,
but if that works it would work on a pull request (or direct
commit to the master branch).

Adding a summary of your use case to #883 would be great,
as would pull requests to the relevant documentation:

https://github.com/biopython/biopython/blob/master/Doc/Tutorial/chapter_testing.tex

https://github.com/biopython/biopython.github.io/blob/master/wiki/Contributing.md

I'm on leave at the moment (and ought to be asleep right now),
so I won't tackle this myself immediately, but I am still trying to
review pull requests.

Thanks!

Peter