[Bioperl-l] Getting sequences by base pair locations

Sendu Bala bix at sendu.me.uk
Fri Jul 28 13:13:44 UTC 2006


Yuval Itan wrote:
> Hello all,
> 
> I was BLATing a few hundred human genes against the chimp genome, and 
> kept the best chimp hits for every human gene.
> I have the base pair start and end location for every chimp hit, and I 
> need to get the sequence for each of these chimp hits. Here is an 
> example for a few chimp hits bp locations:
> 
> Start End*
> *142854 144504
> 154479 155198
> 153066 167370
> 163146 163559
> 
> I have one chimp genome file (about 3GB) including all chromosomes, but 
> I could also get one file per chromosome if that would make things 
> easier. Does anyone have a script or a link for an interface that can do 
> the job?

If your genome file is in some standard format, use SeqIO.
http://www.bioperl.org/wiki/HOWTO:SeqIO

And then get the sequence corresponding to the correct chromosome and 
get the desired chunk with subseq();
http://www.bioperl.org/wiki/HOWTO:Beginners#The_Sequence_Object

You'd also have to make sure that the data used during the blat is 
exactly the same data you have in your big file.



More information about the Bioperl-l mailing list