[Bioperl-l] Polyproteins, ribo slippage, and mat_peptide in viruses?
clarsen at vecna.com
Tue Oct 27 12:33:01 EDT 2009
I am attempting to find some solutions to a DB loading problem we are
encountering in viruses. It is multifold:
Some viruses churn out a polyprotein rather than individual peptides;
further they also slip the ribosome, so a source nucleotide is used
more than once in translation (ribosome halts, backs up one
nucleotide, and continues in a new frame); and finally we have post
translational processing into mature peptides. The main thing is that
the mature peptide is contained a a subset of the whole parent
polyprotein, but is not provided as a single file in GBK for each
mat_peptide CDS. We have to get that in order to run algorithms on the
relevant processed proteins. Therefore we cannot directly load into
GUS, but rather have to choose how to get the mat_peptide sequence.
Actually I think the viruses know that, and are just messing with us
out of spite, since we have iPods and they dont. Anyway.. from anyone
who has encountered this I seek guidance.
We have as choices:
1. Get the locations of mature peptide children in /Protein/
carve the mat_peptide sequence out of the whole polyprotein translation
check that the mat_peptide is infact an identical subset of the
2. Use the locations of starts and stops in /Nucleotide/
translate that, using the slippage information
get mature peptides that line up exactly to the parent polyprotein
If you know of BioPerl sequence handling support for this, I would
love to hear more. Clearly this is a nonstandard thingamabob.
Christopher Larsen, Ph.D.
Sr. Scientist / Grants Manager
6404 Ivy Lane #500
Greenbelt, MD 20770
Phone: (240) 965-4525
Fax: (240) 547-6133
More information about the Bioperl-l