[BioPython] extraction from genbank/embl files

Liam Thompson dejmail at gmail.com
Sun Apr 5 12:59:15 UTC 2009


Hi everyone

I have a list of accession numbers, which I've used to download the entire
genomic sequences of several hundred hepatitis B virus isolates. What I am
trying to do is extract 3 gene sequences from each genomic sequence, and
place each sequence in one of 3 files depending on the gene for further
analysis.

The question is whether there is a shorter way to extract from Genbank files
using the Genbank parser, specific gene sequences, or whether I would need
to identify the gene of each genomic isolate individually (as they are
called a variety of names, despite being the same gene which makes it
trickier), copy the coordinates of the gene sequence, and then proceed
further down the file and actually perform the copying of the gene.

I not experienced in python (or other languages for that matter), but I am
trying.

Any suggestions would be greatly appreciated

Thanks
Liam







-- 
-----------------------------------------------------------
Antiviral Gene Therapy Research Unit
University of the Witwatersrand
Faculty of Health Sciences, Room 7Q07
7 York Road, Parktown
South Africa
2193

Tel: 2711 717 2465/7
Fax: 2711 717 2395
Email: liam.thompson at students.wits.ac.za / dejmail at gmail.com



More information about the Biopython mailing list