[Bioperl-l] sets of sequences - how to read?

Mon Jun 17 12:48:56 UTC 2013

The best thing to do in this case is to try contacting the author for Bio::ASN1::EntrezGene to see if the code can be updated.  He indicated interest in putting the code on github and giving BIOPERLML co-maint last time I heard; we could easily do that, but I'm not sure if it is currently hosted anywhere else.

chris

On Jun 17, 2013, at 9:37 AM, Carnë Draug <carandraug+dev at gmail.com> wrote:

> On 17 May 2013 05:08, Fields, Christopher J <cjfields at illinois.edu> wrote:
>> On May 15, 2013, at 8:53 PM, Carnë Draug <carandraug+dev at gmail.com> wrote:
>>> Hi
>>> 
>>> when accessing entrez gene using eutils to get multiple genes, NCBI
>>> now returns an Entrezgene-Set[1] rather than a list of EntrezGene.
>>> This change must have happened sometime on the last 2 months.
>>> 
>>> [...]
>>> 
>>> Carnë
>>> 
>>> [1] http://0-www.ncbi.nlm.nih.gov.elis.tmu.edu.tw/IEB/ToolBox/CPP_DOC/asn_spec/Entrezgene-Set.html
>> 
>> This doesn't surprise me too much; I know there have been some changes brewing, but didn't know when they would land.  I guess that would be... <looks at watch>... now.
> 
> Hi,
> 
> for those interested, I have contacted NCBI about this and they have
> reverted the change (see conversation below). Still, entrezgene-set is
> a thing so the issue of reading such things still exists.
> 
> Carnë
> 
> 
> ---------- Forwarded message ----------
> Date: 17 May 2013 00:36
> Subject: Entrezegene-Set: recent changes to E-utilities
> 
> Hi
> 
> I believe there was a recent change to the E-utilities service. When
> fetching multiple ASN1 entrezegene records from the gene database, it
> now returns an Entrezgene-Set instead of the typical list of
> Entrezgene records, one after the other.
> 
> For example, here's an example Entrezgene-Set:
> 
> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=3014,85235&rettype=asn1&retmode=text
> 
> which used to be the same as a concatenation of:
> 
> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=3014&rettype=asn1&retmode=text
> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=85235&rettype=asn1&retmode=text
> 
> This is something new. I don't know exactly when it was introduced but
> must have been sometime in the last 2 months.
> 
> I don't know about other programming languages, but at least in Perl
> there is no module able to parse this files. I have already contacted
> the author of the module responsible for reading the non-set
> Entrezgene with a patch but who knows when it will made available. The
> only workaround is to make multiple requests, one for each UID, which
> will obviously annoy your servers.
> 
> As far as I am aware, there was no notification of this change to
> E-utilities, which worked fine for many years. We did have a lot of
> code that worked fine for years, until it started to fail last month.
> And no one using perl will be able to parse them until a fix is
> released. Is there anyway this change can be reverted?
> 
> 
> ---------- Forwarded message ----------
> Date: 23 May 2013 04:53
> Subject: Re: Entrezegene-Set: recent changes to E-utilities
> 
> Thanks very much for your report. I will discuss this with the Gene
> development team to see why this change occurred and get back to you.
> Out of curiosity, have you considered using the XML format for Gene
> (&retmode=xml)? There are a variety of XML parsers for Perl that should be
> able to read Gene XML.
> 
> 
> ---------- Forwarded message ----------
> Date: 24 May 2013 13:01
> Subject: Re: Entrezegene-Set: recent changes to E-utilities
> 
> thank you for looking into this.
> 
> While there are several XML parsers for perl, there is not one that
> will return a Bio::Seq object (a Bio::SeqIO compliant). Of course I
> could use one of the XML parsers to write write my own but then I
> could as well fix the entrezgene parser to deal with Entrezgene-sets
> which is what I'm doing. I already proposed a patch to them but the
> inclusion of a new concept, of a set of sequences, does not really fit
> in the design of Bio::Seq.
> 
> Please do let me know of more news on this. Thank you again,
> 
> 
> ---------- Forwarded message ----------
> Date: 13 June 2013 22:08
> Subject: Re: Entrezegene-Set: recent changes to E-utilities
> 
> The fix for this should now be live. Let us know if you have further
> problems with this.
> 
> Regards,