[BioPython] Extracting gene position information from whole chromosome information

Tue Dec 12 11:41:27 UTC 2006

On Monday 11 December 2006 11:40, Tiago Antão wrote:
> Hi all,
>
> I am trying to understand what would be the best practice in BioPython
> to extract gene position information from genomic information.
> I am currently using genomic information (as opposed to querying
> GenBank for a gene and looking at metadata) mainly because I am doing
> genome-wide studies (in fact crossing information with the HapMap
> project).
>
> My current strategy (which is quite poor, IMO) is as this:
> 1. I get all ASN files for human chromosomes (e.g.
> ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/CHR_02/hs_ref_chr2.asn.gz )
> 2. Search (textual 'dump' search, not using any kind of parser) for
> the locus of interest (say, lactase).
> 3. Find the positions in the genome where the gene is coded
> 4. Use it (in my case to find relevant SNPs in HapMap)
>
> For me (being a BioPython newbie), I end up with some doubts:
> 1. Are there any mechanisms (BioPython wise) to parse genome
> (chromosome) wide ASN files? I could not find none in the cookbook...
> 2. Would this be the best strategy (Searching for annotations in
> single gene files would be another strategy...)?

I would simply use the UCSC table browser (or their tab-delimited text files) 
to do this.  

Sean