[Biopython] How to efetch Unigene records? Is it possible at all?

Carlos Javier Borroto carlos.borroto at gmail.com
Thu Jul 30 22:27:24 UTC 2009


On Thu, Jul 30, 2009 at 6:09 PM, Brad Chapman<chapmanb at 50mail.com> wrote:
> Hi Carlos;
>
>> I have the task to from a list of human genes of interest, grab their
>> protein counter parts in the database to do some additional work.
>
> It looks like you are doing things correctly, but I'm not sure if
> NCBI supports retrieving UniGene records through the efetch
> interface. I tried playing around with it for a bit and got the same
> problems as you; the documentation on their site is also not very
> clear about if unigene is supported and what return types to get.
> Not having a lot of experience with UniGene, my guess is this isn't
> the right direction to go.
>
> My suggestion to get your work done is to download the *.data files
> from the ftp site:
>
> ftp://ftp.ncbi.nih.gov/repository/UniGene/
>
> and write a script that runs through these and pulls out the protein
> identifiers of interest. You should be able to use the UniGene
> parser for this and use the protsim attribute of each record. With
> these, you can get the GI number (protgi attribute) and use this to
> fetch the relevant GenBank records through Entrez.
>
> Hope this helps,
> Brad
>

Thanks, I was wondering because this is the first time I use Biopython
or NCBI scripting facilities if I was doing something completely
wrong. I'm going to follow your advice.

Thank you for taking the time to review my concern.
regards,
-- 
Carlos Javier



More information about the Biopython mailing list