[Biopython] searching for a human chromosome position
Brad Chapman
chapmanb at 50mail.com
Mon Jun 1 12:20:52 UTC 2009
dr goettel:
> > I want to, given a number of a human chromosome, the position of the
> > nucleotide and the nucleotide that should be in this position, search for
> > that position and determine if there has been a mutation and if that
> > mutation produces an aminoacid change or not.
Peter:
> Given a human chromosome (e.g. as a FASTA or GenBank file from the
> NCBI) and a location on it, you can easily use Biopython to extract
> that position (or region).
Agreed with Peter here -- this is not a straightforward task.
Generally, the steps I would use would be:
- Define a reference genome to use, along with feature mappings of
gene models.
- Parse the gene models (normally as GenBank format or GFF) and
extract locations of coding regions.
- Use the coding region locations to build a hash table of locations
to coding identifiers. For these type of hashes, Berkeley DB is
useful and in the standard library. There are also many other
key/value document stores out there that handle the task well.
- Use your lookup hash to determine if potential SNP bases fall into
coding regions.
- If so, use your parsed gene model locations to identify the
position in the coding sequence. You will have to remap
coordinates to account for exons/introns, and manage coding
sequences on the reverse strand.
A re-usable component to do the last part would be generally useful
to a lot of people.
Brad
More information about the Biopython
mailing list