[Biopython] searching for a human chromosome position

Mon Jun 1 13:15:46 UTC 2009

On Monday 01 June 2009 12:54:35 Peter wrote:
> On Fri, May 29, 2009 at 10:36 AM, dr goettel <biopythonlist at gmail.com>
wrote:
> > Hello,
> > I am new using biopython and after reading the documentation I'd like
> > some guides to resolve one "simple" thing.
> > I want to, given a number of a human chromosome, the position of the
> > nucleotide and the nucleotide that should be in this position, search
for
> > that position and determine if there has been a mutation and if that
> > mutation produces an aminoacid change or not. I supose that first of all
> > I have to query genome database(?) using Entrez module and retrieve the
> > sequence where this base is. Then I supose I have to look for translated
> > sequences of this sequence and see what is the most probably frame of
> > traduction for this sequence and then see if there  is a change of
> > aminoacid or not.
> >
> > Please could anybody send some clues for querying the database and find
> > the most probably frame of traduction to protein (in case that this is a
> > good workflow to solve this particular problem)??
> >
> > Thankyou very much.
> > d
>
> I don't think your task is "simple".
>
I should have added a :-) right after "simple".

> Given a human chromosome (e.g. as a FASTA or GenBank file from the
> NCBI) and a location on it, you can easily use Biopython to extract
> that position (or region).

> You could also look at the provided annotation in the GenBank file to
> see if the location falls within a gene CDS, and thus if a mutation at
> that position would cause an amino acid change. Note that because in
> humans you have introns/exons to worry about, this is actually quite
> complicated! (If you don't want to use the existing annotation, you
> would have to do your own gene finding, which is even more
> complicated.)

This is exactly what I need to do. Could someone redirect me to the
documentation part or some code needed to, given the chromosome, use
Biopython to extract that position?? Looking at the documentation

handle=Entrez.efetch(db="genome", id="9606", rettype="gb") but cannot find
where to set the chromosome (e.g chr="3"??)

Fortunately, all the positions that I need to search are allways in exons
and withing a gene CDS.
>
> You could manually download the complete chromosomes from here. I
> would get the GenBank files (which will need uncompressing):
> ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/
>
> If you have a location, you will need to check which version of the
> chromosome it refers to. Note that there are three versions of the
> human chromosomes available on the above FTP site, and there will be
> lots soon from the 1000 genomes project. You could search Entrez for
> the human chromosome, but make sure you get the right version for your
> location! I would probably do this manually (not in a script).
>
> If you parse the GenBank file using Bio.SeqIO, the gene annotations
> will be stored as SeqFeature objects. Have a look in the tutorial, and
> also this page for some tips on dealing with these:
> http://www.warwick.ac.uk/go/peter_cock/python/genbank/

I'll look into this, thankyou!

>
> On a general point, you are talking about mutations - are you going to
> be re-sequencing this region in different patients to actually check
> for a mutation? Working from a single reference genome you won't be
> able to say if there is a mutation (e.g. a SNP) at a given position -
> although data from the the 1000 genome project could be useful.
>
Basically the region is re-sequenced in different patiens and we look at
some positions where we are hoping to find some nucleotide.

> I hope that helps.
>
It helps a lot. Thankyou

> Peter
>
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython