[Bioperl-l] Read/write round-tripping Was: Re: New Bioperl dependency? Sort::Naturally

Chris Fields cjfields at illinois.edu
Mon May 10 16:32:43 UTC 2010


If there is dynamic ID assignment I would assume you can't compare them between runs, so using is_deeply() won't work as advertised since we already know the ID will change between runs anyway, it's a self-fulfilling prophecy.  Also, is_deeply() here is inspecting the SF::Collection blessed hash directly (the _btree is a tied DB_File hash), not sure that's what you want either.  

So at this point I would have to ask myself:

1) Is the dynamic ID assignment a bug (e.g. should we be using a fixed ID of some sort)?  If not, we can't expect these to match across runs, so is_deeply won't work.
2) Would it make more sense to explicitly inspect the handled objects (SF::Collection) directly via method calls?  For instance, if I want to see whether a set of features falls within a region, is that reproducible between runs?

Either way, I'm not sure what using Test::Deeply would gain you, as it's still meant to inspect complex data structures, just with a bit more sugar than Test::More and is_deeply().  Per #2 above, I would be more explicit in inspecting the SF::Collection:

my $collection = $contig->get_features_collection;
# check that IDs in SF::Collection conform to a regex using like()
# inspect other things about the collection...

chris

On May 9, 2010, at 2:26 AM, Florent Angly wrote:

> Chris,
> 
> I've thought some more on the problem and I now agree with you that round-tripping at the object-level is more powerful.
> 
> It has the problem that some objects are given IDs dynamically every time, which means that identical input files won't have an identical object.
> 
>> is_deeply( $obj_out , $obj_in , 'deep compare' );
> 
>> not ok 1 - deep compare
>> #   Failed test 'deep compare'
>> #   at ./test_roundtrip.pl line 33.
>> #     Structures begin differing at:
>> #     ${     $got->{_contigs}{Contig35}{_sfc}{_btree}} = '56438592'
>> #     ${$expected->{_contigs}{Contig35}{_sfc}{_btree}} = '54980512'
>> 1..1
>> # Looks like you failed 1 test of 1.
> 
> 
> And when I re-run this again:
> 
>> not ok 1 - deep compare
>> #   Failed test 'deep compare'
>> #   at ./test_roundtrip.pl line 33.
>> #     Structures begin differing at:
>> #     ${     $got->{_contigs}{Contig35}{_sfc}{_btree}} = '47763264'
>> #     ${$expected->{_contigs}{Contig35}{_sfc}{_btree}} = '46305184'
>> 1..1
>> # Looks like you failed 1 test of 1.
> 
> Note how the value of _btree changes everytime.
> 
> Maybe using Test::Deep would be a good approach (http://search.cpan.org/~fdaly/Test-Deep-0.106/lib/Test/Deep.pod):
>> Where it becomes more interesting is in allowing you to do something besides simple exact comparisons. With strings, the |eq| operator checks that 2 strings are exactly equal but sometimes that's not what you want. When you don't know exactly what the string should be but you do know some things about how it should look, |eq| is no good and you must use pattern matching instead. Test::Deep provides pattern matching for complex data structures
> 
> Florent
> 
> 
> 
> 
> On 09/05/10 10:02, Chris Fields wrote:
>> Should clarify that: round-tripping to generate the same data structure/object is good and what we want.  Round-tripping to generate the exact same output is not our highest priority.
>> 
>> chris
>> 
>> On May 8, 2010, at 6:47 PM, Chris Fields wrote:
>> 
>>   
>>> To tell the truth, I'm more worried about getting data from various formats into Bio::* objects than getting the output 100% correct and identical to the original input.  None of the SeqIO module make that specific promise, simply b/c it's a nearly impossible thing to maintain, with very little payback.  Round-tripping is fine and all, just not our first priority.
>>> 
>>> chris
>>> 
>>> On May 8, 2010, at 6:34 AM, Florent Angly wrote:
>>> 
>>>     
>>>> Same question about the CPAN module Test::Files (http://search.cpan.org/~philcrow/Test-Files-0.14/Files.pm<http://search.cpan.org/%7Ephilcrow/Test-Files-0.14/Files.pm>). I could see myself using it in the BioPerl unit tests to make sure that the assembly files written match the input assembly files.
>>>> 
>>>> It looks like the Bio::SeqIO modules tests could use it as well.
>>>> 
>>>> Cheers,
>>>> 
>>>> Florent
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>       
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>     
>>   
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list