[Biopython-dev] SeqFeature comparison for equality

Andrea Pierleoni andrea at biocomp.unibo.it
Tue Oct 18 12:59:05 UTC 2011


Hi,
I don't know if this can help,
but I've been subclassing seqfeature and seqrecord objects to assert
equalities.
I've attached the very simple code for the seqfeature equality
Handling complex location equalities with a given set of rules could be
misleading.
a feature starting in position 11 is different, for me, from one located
at position 12.

Andrea


> ------------------------------
>
> Message: 4
> Date: Mon, 17 Oct 2011 12:57:53 -0500
> From: Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>
> Subject: Re: [Biopython-dev] [biopython-dev] SeqFeature comparison for
> 	equality
> To: Peter Cock <p.j.a.cock at googlemail.com>
> Cc: biopython-dev at biopython.org
> Message-ID:
> 	<CA+ypG2Y9Qo2iXr0HEwZion-6zffSQjB1qAkqGTgG2_JLP+G0jQ at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> El 17 de octubre de 2011 12:15, Peter Cock
> <p.j.a.cock at googlemail.com>escribi?:
>
>> Hi Joshua,
>>
>> Could you CC the biopython-dev mailing list, unless you
>> specifically want to discuss something in private?
>>
>
> Sorry about that, I thought i was answering to mailin list.
>
>>
>> 2011/10/17 Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>:
>> > I'm on it.
>> >
>> > Will add __eq__ to FeatureLocation on trunk.
>>
>> Great.
>>
>> In the short term, you can just work on it directly with a copy of the
>> official repository and send me a patch (use git patch > file.patch)
>>
>> The "best" way is to fork biopython on github, and create your
>> own branch with these changes.
>>
>> > I think BeforeLocation should check if the second is before,
>> > After check if it is after, etc, and this can be done in locations.
>> >
>> > Before I implement those: do you agree?
>> >
>> > In that case, AbstractLocation instances
>> > should check if ExactLocation instances are
>> > inside their range, and AbstractLocation
>> > instances to be exactly the same.
>>
>>
> This positions would be the same:
>
> OneOfPosition(5, 11, 15),
> ExactPosition(11),
> AfterPosition(4),
> BeforePosition(16),
> WithinPosition(5, 16),
>
>
>> No. Having tried this myself, it is very complicated.
>>
>
> I think I'm missing something, why is it hard?,
> I see it as a cases listing.
>
>
>> Also, there are constraints with the Python language
>> about equality, hashing and comparisons (e.g. for
>> membership in lists, or use as dictionary keys).
>>
>
> I don't think anyone should use Features as dictionary keys,
> they will use Feature Id for that, but maybe someona wants a
> set of features (which just now is like a list of all sequences)...
>
> I which cases that should be a problem? (I'm biothechnology
> engineer, so I don't see all caveats, and i don't really have
> deep undestanding about how python works)
>
> The current behaviour of simple comparison of
>> the positions as an integer is at least simple.
>>
>> > About SeqFeature, I think they should be
>> > the same if they share all locations.
>>
>> You don't care about feature type and ID?  ;)
>>
>
> maybe not, a comparison could skip iterating
> the locations if we have the same type and id,
> still not sure that's a good method (thus the comment
> ?# Can we trust this?? on my patch) but a feature
> 'CDS' is sometimes equivalent to feature 'mRNA',
> in that case ID and type would both be different
> in seqfeatures.
>
>>
>> Peter
>>
>
>
>
> ------------------------------
>
> Message: 5
> Date: Mon, 17 Oct 2011 19:07:27 +0100
> From: Peter Cock <p.j.a.cock at googlemail.com>
> Subject: Re: [Biopython-dev] [biopython-dev] SeqFeature comparison for
> 	equality
> To: Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>
> Cc: biopython-dev at biopython.org
> Message-ID:
> 	<CAKVJ-_4LY9NBQY1TerBprjVWFHS9FYpD50CZRG0XXOoZUNtdpQ at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> 2011/10/17 Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>:
>> ...
>>
>> This positions would be the same:
>>
>> OneOfPosition(5, 11, 15),
>> ExactPosition(11),
>> AfterPosition(4),
>> BeforePosition(16),
>> WithinPosition(5, 16),
>
> I don't understand what you are asking here. Those
> positions do not look the same to me.
>
>>>
>>> No. Having tried this myself, it is very complicated.
>>
>> I think I'm missing something, why is it hard?,
>> I see it as a cases listing.
>
> Well, try it and write lots of unit tests, and I'll review it.
>
>>>
>>> Also, there are constraints with the Python language
>>> about equality, hashing and comparisons (e.g. for
>>> membership in lists, or use as dictionary keys).
>>
>> I don't think anyone should use Features as dictionary keys,
>> they will use Feature Id for that, but maybe someona wants a
>> set of features (which just now is like a list of all sequences)...
>>
>> I which cases that should be a problem? (I'm biothechnology
>> engineer, so I don't see all caveats, and i don't really have
>> deep undestanding about how python works)
>
> Using positions as dictionary keys seems reasonable.
>
> Using a SeqFeature as a key is not possible as they
> are mutable objects.
>
>>> The current behaviour of simple comparison of
>>> the positions as an integer is at least simple.
>>>
>>> > About SeqFeature, I think they should be
>>> > the same if they share all locations.
>>>
>>> You don't care about feature type and ID? ?;)
>>
>> maybe not, a comparison could skip iterating
>> the locations if we have the same type and id,
>> still not sure that's a good method (thus the comment
>> ?# Can we trust this?? on my patch) but a feature
>> 'CDS' is sometimes equivalent to feature 'mRNA',
>> in that case ID and type would both be different
>> in seqfeatures.
>
> A gene, mRNA and CDS might all have the same
> position, but they are different features.
>
> Peter
>
>
>
> ------------------------------
>
> Message: 6
> Date: Mon, 17 Oct 2011 13:27:19 -0500
> From: Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>
> Subject: Re: [Biopython-dev] [biopython-dev] SeqFeature comparison for
> 	equality
> To: Peter Cock <p.j.a.cock at googlemail.com>
> Cc: biopython-dev at biopython.org
> Message-ID:
> 	<CA+ypG2a8G2+fn3HxNZ62SkKGjAJeZjEpOoVHhxufhwMYd1dQ6g at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> El 17 de octubre de 2011 13:07, Peter Cock
> <p.j.a.cock at googlemail.com>escribi?:
>
>> 2011/10/17 Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>:
>> > ...
>> >
>> > This positions would be the same:
>> >
>> > OneOfPosition(5, 11, 15),
>> > ExactPosition(11),
>> > AfterPosition(4),
>> > BeforePosition(16),
>> > WithinPosition(5, 16),
>>
>> I don't understand what you are asking here. Those
>> positions do not look the same to me.
>>
>>
> They are not *exactly* the same, but besides
> AfterPosition and BeforePosition,
> ExactPosition(11) is included in OneOfPosition(5, 11, 15),
> ExactPosition(11) is after AfterPosition(4)
> ExactPosition(11) is before BeforePosition(16)
> ExactPosition(11) is included in WithinPosition(5, 16)
> All positions in OneOfPosition are before BeforePosition,
> after AfterPosition, within WithinPosition, and includes
> ExactPosition.
> Al positions in WithinPosition are after AfterPosition,
> before BeforePosition.
>
> BeforePosition and AfterPosition can't be equal.
>
> How should I name the TestCases?
>
>
>
> ------------------------------
>
> Message: 7
> Date: Mon, 17 Oct 2011 20:03:15 +0100
> From: Peter Cock <p.j.a.cock at googlemail.com>
> Subject: Re: [Biopython-dev] [biopython-dev] SeqFeature comparison for
> 	equality
> To: Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>
> Cc: biopython-dev at biopython.org
> Message-ID:
> 	<CAKVJ-_6DRB5aN0WzuVoNF7iufFdamkOdEhZxJot=yYWR5dMZwg at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> 2011/10/17 Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>:
>>
>>
>> El 17 de octubre de 2011 13:07, Peter Cock <p.j.a.cock at googlemail.com>
>> escribi?:
>>>
>>> 2011/10/17 Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>:
>>> > ...
>>> >
>>> > This positions would be the same:
>>> >
>>> > OneOfPosition(5, 11, 15),
>>> > ExactPosition(11),
>>> > AfterPosition(4),
>>> > BeforePosition(16),
>>> > WithinPosition(5, 16),
>>>
>>> I don't understand what you are asking here. Those
>>> positions do not look the same to me.
>>>
>>
>> They are not *exactly* the same, but besides
>> AfterPosition and BeforePosition,
>> ExactPosition(11) is included in OneOfPosition(5, 11, 15),
>> ExactPosition(11) is after AfterPosition(4)
>> ExactPosition(11) is before BeforePosition(16)
>> ExactPosition(11) is included in WithinPosition(5, 16)
>> All positions in OneOfPosition are before BeforePosition,
>> after AfterPosition, within WithinPosition, and includes
>> ExactPosition.
>> Al positions in WithinPosition are after AfterPosition,
>> before BeforePosition.
>> BeforePosition and AfterPosition can't be equal.
>>
>
> It might help it you wrote these out explicitly,
> e.g. currently:
>
>     >>> from Bio.SeqFeature import *
>     >>> a = BeforePosition(10)
>     >>> b = AfterPosition(10)
>     >>> a == b == 10
>     True
>
> Currently BeforePosition and AfterPosition act like
> the integer position for comparison etc. I find this
> reasonable given we have to treat them as the
> integer for things like extracting the sequence.
>
>> How should I name the TestCases?
>>
>
> Something like test_SeqFeature.py and using
> unittest. Most existing tests in this area are in
> doctests and test_SeqIO_feature.py
>
> Peter
>
>
>
> ------------------------------
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>
> End of Biopython-dev Digest, Vol 105, Issue 15
> **********************************************
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: seqfeature_eq.py
Type: text/x-python-script
Size: 1505 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20111018/d7133fd9/attachment-0002.bin>


More information about the Biopython-dev mailing list