[Biopython] searching for a human chromosome position

Brad Chapman chapmanb at 50mail.com
Mon Jun 1 08:20:52 EDT 2009


dr goettel:
> > I want to, given a number of a human chromosome, the position of the
> > nucleotide and the nucleotide that should be in this position, search for
> > that position and determine if there has been a mutation and if that
> > mutation produces an aminoacid change or not. 

Peter:
> Given a human chromosome (e.g. as a FASTA or GenBank file from the
> NCBI) and a location on it, you can easily use Biopython to extract
> that position (or region).

Agreed with Peter here -- this is not a straightforward task.
Generally, the steps I would use would be:

- Define a reference genome to use, along with feature mappings of
  gene models.

- Parse the gene models (normally as GenBank format or GFF) and
  extract locations of coding regions.

- Use the coding region locations to build a hash table of locations
  to coding identifiers. For these type of hashes, Berkeley DB is
  useful and in the standard library. There are also many other
  key/value document stores out there that handle the task well.

- Use your lookup hash to determine if potential SNP bases fall into
  coding regions.

- If so, use your parsed gene model locations to identify the
  position in the coding sequence. You will have to remap
  coordinates to account for exons/introns, and manage coding
  sequences on the reverse strand.

A re-usable component to do the last part would be generally useful
to a lot of people.

Brad


More information about the Biopython mailing list