[Bioperl-l] Fetching genomic sequences based on HUGO names or GeneIDs

Brian Osborne osborne1 at optonline.net
Thu Feb 16 22:19:16 UTC 2006


Chris,

Yes. The question now is where to easily get the coordinates.

Brian O.


On 2/16/06 7:52 AM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> I think a method was recently implemented in Bio::DB::GenBank to
> retrieve a segment of DNA given start and end coordinates in GenBank
> format; that should contain the features you need.  I requested it
> ~Nov-Dec in the mailing list but didn't get a chance to test it.
> Would that help?
> 
> On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote:
> 
>> Harry,
>> 
>> It's not clear to me that NCBI's eutils offers this capability
>> directly. You
>> can probably download Entrez Gene entries and parse them for
>> coordinates but
>> I know of no way to remotely retrieve genomic sequences like this
>> from NCBI
>> (ENSEMBL API perhaps?). What I had in mind uses the local approach
>> that some
>> of us favor and to prove to myself that this is simple to do I wrote a
>> script that I just added to examples/tools, it's called
>> extract_genes.pl and
>> it's based on Bio::DB::Fasta. Download the sequence files for a given
>> species to some dir, download Entrez Gene's gene2accession file,
>> and run. It
>> creates and stores a hash for lookups, it won't read gene2accession
>> each
>> time it runs.
>> 
>> Brian O.
>> 
>> 
>> On 2/14/06 12:15 PM, "Harry Mangalam" <hjm at tacgi.com> wrote:
>> 
>>> Hi Brian,
>>> 
>>> Thanks very much for the pointers and the speed of your reply and
>>> apologies
>>> for the speed of mine.
>>> 
>>> This looks good, but what I was looking for was a bioP approach
>>> for hooking to
>>> an API at NCBI or EBI so I could get this info and seqs from
>>> them.  In this
>>> case, speed of retrieval is not critical and I'd rather not
>>> download the
>>> entirety of the sequences to a local disk to hack at them.
>>> 
>>> I've determined a screen-scraping approach to get them and could
>>> script that,
>>> but I thought that bioP had a method for using NCBI's external
>>> API's, tho it
>>> may be that my memory is faulty or the approach is no longer
>>> supported due to
>>> overload.
>>> 
>>> Does NCBI make such APIs available anymore?  I searched a bit for
>>> docs on them
>>> but couldn't find anything (unless it's buried in the NCBI tookit,
>>> which I
>>> haven't started to excavate).
>>> 
>>> Failing that, would SEALS provide such a service? Any PerlPinipeds
>>> listening?
>>> 
>>> Harry
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
>>>> Harry,
>>>> 
>>>> Hope you're doing well. The approach could be based on
>>>> Bio::DB::Fasta. So,
>>>> from its documentation:
>>>> 
>>>>   use Bio::DB::Fasta;
>>>> 
>>>>   # create database from directory of fasta files
>>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>> 
>>>>   # simple access (for those without Bioperl)
>>>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>>>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>>>>   my @ids     = $db->ids;
>>>>   my $length   = $db->length('CHROMOSOME_I');
>>>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
>>>>   my $header   = $db->header('CHROMOSOME_I');
>>>> 
>>>>   # Bioperl-style access
>>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>> 
>>>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
>>>>   my $seq     = $obj->seq;
>>>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
>>>> 
>>>> Do you already have the offsets?
>>>> 
>>>> Brian O.
>>>> 
>>>> On 2/12/06 1:46 AM, "Harry Mangalam" <hjm at tacgi.com> wrote:
>>>>> Hi All,
>>>>> 
>>>>> After perusing the tutorial and other docs for a an evening, I
>>>>> still
>>>>> can't find the answer to this.  Forgive me if I've missed something
>>>>> obvious.
>>>>> 
>>>>> This should not be a novel request, but I've not found it
>>>>> answered.  If
>>>>> bioperl isn't the best way to do this, I'd be grateful to a
>>>>> pointer to a
>>>>> better way, especially if it includes an illuminating bit of code.
>>>>> 
>>>>> The problem is to retrieve genomic sequences plus & minus some
>>>>> offset
>>>>> from a locus determined by HUGO keyword or GeneID.  This would be a
>>>>> common followup chore for some extra analysis from a gene
>>>>> expression
>>>>> expt.  Or maybe this is in the DBFetch routines, but I've missed
>>>>> the
>>>>> sequence type to specify...?
>>>>> 
>>>>> 
>>>>> TIA!
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list