[BioPython] Extracting gene position information from whole chromosome information

Mon Dec 11 16:40:09 UTC 2006

Hi all,

I am trying to understand what would be the best practice in BioPython
to extract gene position information from genomic information.
I am currently using genomic information (as opposed to querying
GenBank for a gene and looking at metadata) mainly because I am doing
genome-wide studies (in fact crossing information with the HapMap
project).

My current strategy (which is quite poor, IMO) is as this:
1. I get all ASN files for human chromosomes (e.g.
ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/CHR_02/hs_ref_chr2.asn.gz )
2. Search (textual 'dump' search, not using any kind of parser) for
the locus of interest (say, lactase).
3. Find the positions in the genome where the gene is coded
4. Use it (in my case to find relevant SNPs in HapMap)

For me (being a BioPython newbie), I end up with some doubts:
1. Are there any mechanisms (BioPython wise) to parse genome
(chromosome) wide ASN files? I could not find none in the cookbook...
2. Would this be the best strategy (Searching for annotations in
single gene files would be another strategy...)?

Thanks a lot,
Tiago
-- 
For every expert, there is an equal and opposite expert. - Arthur C. Clarke