[Bioperl-l] Bio::ASN1::EntrezGene now on Github

Fields, Christopher J cjfields at illinois.edu
Wed Sep 11 03:50:07 UTC 2013


On Sep 10, 2013, at 12:05 PM, Carnë Draug <carandraug+dev at gmail.com>
 wrote:

> On 10 September 2013 17:47, Fields, Christopher J <cjfields at illinois.edu> wrote:
>>> ...
>> 
>> …and so you can now, with the magic of github, make the changes directly to the repo
> 
> Yeah. But what I wanted was to discuss the implications of that
> change. It is related to a previous discussion [1]. If memory serves
> me right, this patch would read the whole sequences of a set rather
> than one at a time. I have used this module directly, not through the
> bioperl module, so I'm unsure if it will break things there.

Yep, I recall this now.

> There's also the thing that next_seq() was always returning an array
> reference with only 1 element (the one sequence). Now there will be
> more elements (one per sequence in the Entrezgene set). People
> expecting the old behaviour will be skipping data unless maybe some
> warning is printed or some other change is performed. Specially
> next_seq returning many sequences (the next set) is misleading.

Would you want a method that implies it only returns one thing (e.g. next_seq)?  Could you make another method that returns data in batches (next_seq_set?  not sure) then rewrite next_seq in terms of it?

> Finally, there's also the thing that the patch reads an entire set
> which for all we know, can be thousands of sequences.
> 
> Carnë

That may be more problematic, yes.  The solutions depend on how the ASN1 parser is set up, which I'm not totally familiar with.  If you are worried about thousands of genes, then maybe a Bio::Cluster-like class for grouping data that generates the objects lazily?

> [1] http://bioperl.996286.n3.nabble.com/sets-of-sequences-how-to-read-td16940.html

I didn't get back to this right away, but I unfortunately already pushed some changes to the repo.  You should still be able to merge your work in, though, or I can back mine out into a branch and let you merge.  Most of mine are to remove the circular dependency issue with Bioperl.

chris



More information about the Bioperl-l mailing list