[Bioperl-l] How to retrieve the Gene Info from the hit genomes start and end positions in the blast table report?

Dave Messina David.Messina at sbc.su.se
Tue Mar 9 21:39:08 UTC 2010


Hi Bhakti,

Forgive me if the below shows that I've totally misunderstood — it's late here.


> The blast table does show the hit organism
> accession number,

As you say, in BLAST -m 8 reports, the hit's accession number is the second column.

I'm not sure when this would be different from the gene's accession number, at least for the entries in nr for which a gene name has been assigned (some are known only by their accession number).


> Based on the Hits Start and End positions, how can I
> retrieve the gene name/acc/id?

The short answer is 'you can't'.

But this makes me think that you're not going against the nr database, but instead whole genome or chromosome sequence records. In which case some of them will have genes annotated in the feature table, which you can get out using BioPerl:

http://www.bioperl.org/wiki/HOWTO:Feature-Annotation

But many (most?) won't be annotated in this way, in which case you will need to find some file or database that has all the genes' start and stop positions in the sequence that you're searching.


Perhaps you could provide a couple of your hits as examples so the problem is clearer?


Dave





More information about the Bioperl-l mailing list