[Bioperl-l] Nucleotide Links in Gene DB (GenBank)

Arkady bamboowarrior at gmail.com
Fri Apr 11 23:10:35 UTC 2008


Hi everyone, I'm a bioperl n00b. Actually, kind of a genbank n00b,
too, as I'm from a CS background and just started bio things last
June.

I'm trying to set up an analysis pipeline of primate protein CDSs (the
nucleotide seqs). I've written a script which does a pretty decent job
of downloading these from GenBank--but it's inconsistent, because a
lot of sequences in nucleotide are 'predicted' and named LOCthisorthat
instead of by gene name.

So what I was thinking was this (assume ANKRD43 is the gene for this example):

1. Search 'gene' database for ANKRD43 AND (PRI*[ORGN])
On NCBI, there's an option to show all nucleotide links. How do I get
a list of those in bioperl? Can bioperl even search 'gene', or just
'nucleotide'?

2. Search 'nucleotide' for the referenced items from #1, and also for
ANKRD43[TITL] AND (PRI*[ORGN]), save CDSes.

3. BLAST mRNA for one of those CDSes, see if we pick up any other matches.

4. BLAT other primates for CDSes, see if we find anything not in GenBank.


On the other hand, I always get the feeling I'm doing things the hard
way--especially here, with #1 and #2. Is there a much more obvious,
simple way to do this?

Thanks, folks.


Cheers,
John Woods

Institute for Cellular and Molecular Biology
The University of Texas at Austin



More information about the Bioperl-l mailing list