[BioPython] Parsing problems

Carlo Bifulco carlo_bif at yahoo.com
Thu Mar 6 11:21:03 EST 2003


Hi folks,

I have been trying to automate the process of designing primers for
cancer specific genomic translocations (my lab is switching from
qualitative to real time pcr), and the because of number of targets
thought of using Biopy. I work with Biopy version 1.10 on W32. I used
CygWin to compile the emboss toolkit.

Here are a few obstacles I encountered so far:

1) I have been working on Ensembl generated genomic sequences saved in
genbank format (e.g.
LOCUS       ENS:ENSG00000151702 118314 bp DNA HTG 6-MAR-2003
DEFINITION  Homo sapiens NCBI31 assembly reannotated via EnsEMBL DNA,
               chromosome 11 130076676..130194989),
but parsing them breaks the parser. A colon in the locus line and
something else in the version and comments lines seems to be the culprit.
Current bypass: wrote a few lines to massage the file before parsing it 
with Biopy (i.e. removing all the breaking spots).

2) I have been unable to compile either or the cvs version under cygwin
(in both cases I get gcc related error messages).
Current bypass: wrote a few python lines to run the
cygwin-emboss eprimer3 utility from pythonW32.
Best solution: running *nix

3) Parsing of the emboss files generated without the -task 1 option had
no problems. However using the -task 1 option (which in addition to the
primers gives and internal oligo, which I need for my assays) broke the
parser.
Current bypass: None yet.

All can probably solved with python ad hoc code, but I thought that
sharing them could be useful in teaching me what I am doing wrong.
Will be happy to provide you with the lines the parser generates by
using (debug_level=2) or with the cygwin-gcc error messages if that can
be useful.

Thanks,
Carlo Bifulco, MD
Dep. of Pathology
University of Florida









More information about the BioPython mailing list