[BioPython] extraction from genbank/embl files

Sean MacEachern sean.maceach at gmail.com
Sun Apr 5 13:13:16 UTC 2009


Hi Liam,

Although not a biopython solution, you should be able to use seqret in
EMBOSS to do something like you have described. You can call seqret in your
python script using popen and write the results to one of your three files.

HTH,

Sean 


On 4/5/09 8:59 AM, "Liam Thompson" <dejmail at gmail.com> wrote:

> Hi everyone
> 
> I have a list of accession numbers, which I've used to download the entire
> genomic sequences of several hundred hepatitis B virus isolates. What I am
> trying to do is extract 3 gene sequences from each genomic sequence, and
> place each sequence in one of 3 files depending on the gene for further
> analysis.
> 
> The question is whether there is a shorter way to extract from Genbank files
> using the Genbank parser, specific gene sequences, or whether I would need
> to identify the gene of each genomic isolate individually (as they are
> called a variety of names, despite being the same gene which makes it
> trickier), copy the coordinates of the gene sequence, and then proceed
> further down the file and actually perform the copying of the gene.
> 
> I not experienced in python (or other languages for that matter), but I am
> trying.
> 
> Any suggestions would be greatly appreciated
> 
> Thanks
> Liam
> 
> 
> 
> 
> 
> 





More information about the Biopython mailing list