[Bioperl-l] Fetching genomic sequences based on HUGO names or GeneIDs
Jason Stajich
jason.stajich at duke.edu
Tue Feb 14 18:25:21 UTC 2006
Are you working spp that are in Ensembl? Is what you need not
provided by Ensembl/EnsMart? Seems like they are doing the best job
integrating gene ids to a central place.
It is not exactly clear what API you are referring to - you can query
Entrez via Bio::DB::Query::GenBank so if you can construct your query
via the Entrez syntax you can access and retrieve it in bioperl.
-jason
On Feb 14, 2006, at 12:15 PM, Harry Mangalam wrote:
> Hi Brian,
>
> Thanks very much for the pointers and the speed of your reply and
> apologies
> for the speed of mine.
>
> This looks good, but what I was looking for was a bioP approach for
> hooking to
> an API at NCBI or EBI so I could get this info and seqs from them.
> In this
> case, speed of retrieval is not critical and I'd rather not
> download the
> entirety of the sequences to a local disk to hack at them.
>
> I've determined a screen-scraping approach to get them and could
> script that,
> but I thought that bioP had a method for using NCBI's external
> API's, tho it
> may be that my memory is faulty or the approach is no longer
> supported due to
> overload.
>
> Does NCBI make such APIs available anymore? I searched a bit for
> docs on them
> but couldn't find anything (unless it's buried in the NCBI tookit,
> which I
> haven't started to excavate).
>
> Failing that, would SEALS provide such a service? Any PerlPinipeds
> listening?
>
> Harry
>
>
>
>
>
>
> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
>> Harry,
>>
>> Hope you're doing well. The approach could be based on
>> Bio::DB::Fasta. So,
>> from its documentation:
>>
>> use Bio::DB::Fasta;
>>
>> # create database from directory of fasta files
>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files');
>>
>> # simple access (for those without Bioperl)
>> my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>> my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>> my @ids = $db->ids;
>> my $length = $db->length('CHROMOSOME_I');
>> my $alphabet = $db->alphabet('CHROMOSOME_I');
>> my $header = $db->header('CHROMOSOME_I');
>>
>> # Bioperl-style access
>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files');
>>
>> my $obj = $db->get_Seq_by_id('CHROMOSOME_I');
>> my $seq = $obj->seq;
>> my $subseq = $obj->subseq(4_000_000 => 4_100_000);
>>
>> Do you already have the offsets?
>>
>> Brian O.
>>
>> On 2/12/06 1:46 AM, "Harry Mangalam" <hjm at tacgi.com> wrote:
>>> Hi All,
>>>
>>> After perusing the tutorial and other docs for a an evening, I still
>>> can't find the answer to this. Forgive me if I've missed something
>>> obvious.
>>>
>>> This should not be a novel request, but I've not found it
>>> answered. If
>>> bioperl isn't the best way to do this, I'd be grateful to a
>>> pointer to a
>>> better way, especially if it includes an illuminating bit of code.
>>>
>>> The problem is to retrieve genomic sequences plus & minus some
>>> offset
>>> from a locus determined by HUGO keyword or GeneID. This would be a
>>> common followup chore for some extra analysis from a gene expression
>>> expt. Or maybe this is in the DBFetch routines, but I've missed the
>>> sequence type to specify...?
>>>
>>>
>>> TIA!
>
> --
> Cheers, Harry
> Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com
> <<plain text preferred>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
--
Jason Stajich
Duke University
http://www.duke.edu/~jes12
More information about the Bioperl-l
mailing list