[Bioperl-l] translating a GenBank file
Derek Gatherer
d.gatherer at vir.gla.ac.uk
Mon Mar 13 09:36:13 UTC 2006
Dear BioPerlers
I have a general strategy question for the following situation. I
want to take GenBank files of viral genomes (~100-200kb only), and
produce a translation around the sequence in a format like:
TAAACCTGTCTTTCAGACCTTGTTGGACATCCCGTACAATCAAGATGTTCCTGTATGTTG
S R C S C M L
TTTGCAGTCTGGCGGTTTGCTTTCGAGGACTATTAAGCCTTTCTCTGCAATCGTCTCCAA
F A V W R F A F E D Y M A F L C N R L Q
ATCTCTGCCCTGGAGTGATTTCAACGCCTTACACGTTGACCTGTCCGTCTAATACATCCT
I S A L E M
where the translation is above the DNA for forward strand and below
for complementary strand ORFs. I initially attempted this using
EMBOSS, where there are a couple of utilities called "showseq" and
"prettyseq" that will take a range of start and stop points and
produce a translation of the type above. However, it turns out that
they are not quite up to the job for translating whole genomes
because showseq throws an exception when the ORFs are overlapping (a
deliberate feature), and both showseq and prettyseq seem to have
trouble with a combination of forward and reverse translations on the
same sequence (not officially confirmed as a bug yet, but certainly
not a feature).
So, before I start trying to hack EMBOSS, is there a better way to do
it in BioPerl? It occurs to me that the above format is not a
"standard", although it is seen quite commonly in publications etc,
which may be the major difficulty.
All suggestions gratefully appreciated
Derek
_________________________
Derek Gatherer Ph.D. Cert.Ed.
Computer Officer
Institute of Virology
Church Street
Glasgow G11 5JR
Tel: 0141-330-6268
Fax: 0141-337-2236
More information about the Bioperl-l
mailing list