[Bioperl-l] fetching exons in genomic coordinates from NCBI

Chris Fields cjfields at illinois.edu
Fri May 27 10:13:59 UTC 2011


On May 27, 2011, at 11:22 AM, Dave Messina wrote:

> On Fri, May 27, 2011 at 10:20, Reece Hart <reecehart at gmail.com> wrote:
> 
>> On Wed, May 25, 2011 at 5:13 AM, Dave Messina <David.Messina at sbc.su.se>wrote:
>> 
>>> As far as I know, you're doing it the NCBI recommended way, byzantine
>>> though it may be. Of course I too would be keen to hear of a better approach
>>> if anyone's got one.
>>> 
>> 
>> Is that really a "recommended" way? Aside from the NCBI eutils pages which
>> describe how to submit queries, I didn't see anything about how to process
>> the results.
> 
> When I said that, I was thinking about the esearch and efetch part, but now
> that I look around, I believe that yes, the NCBI expects us to parse the XML
> using XML libraries such as libXML.
> 
>> ...
> 
> ...
> I am certainly not an expert in this area, but yeah, it sure seems like
> there should be some more human-readable guide to their XML formats than
> just the above.
> 
> Dave

Brian Osborne and I set up a page to answer some of these questions (Brian's answer is for EntrezGene XML, and there is a EUtilities example).  It's here:

http://www.bioperl.org/wiki/HOWTO:Getting_Genomic_Sequences

It's possible if you were going a pure eutils-based route you could kludge something together from the various examples to get at what you want.  Note that esummary gives you coords as well, is shorted, and has some OO-based ways to get at the data generically:

http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook

chris





More information about the Bioperl-l mailing list