[Bioperl-l] Polyproteins, ribo slippage, and mat_peptide in viruses?

Chris Larsen clarsen at vecna.com
Thu Nov 12 17:22:26 UTC 2009


All,

This is a short followup on the prior thread of discussion, regarding  
computing mature peptide sequences for viruses. The topic has gone  
underwater for the time being as we solve some problems with source  
data. While the biopython effort and contributors on this board have  
given good guidance, and we now have scripts that function (thanks  
mostly to pcock), however, the source data on which everything relies  
is suspect:

   mat_peptide	15118..16914	<===
		/product="nsp13"	
		/note="helicase"
I can tell you the virus community does not want to rely heavily, on  
those position numbers. Furthermore we have found fewer compete source  
genomes for viruses than bacteria, more virus-to-virus variation in  
the data fields annotated in the GBK file, (Gene, CDS, ORF, Protein,  
Polyprotein, mat_peptide, db_xref) and in fact the community will have  
to come together significantly on how these molecules are defined in  
public repositories, before a mature scripting effort becomes  
reliable, public and well received. Because of the variation in  
viruses, it's not even clear at this point what a 'gene' is. I will  
let you know how we proceed when more sequence data has been fully  
analyzed, and we can think about making any perl based solution a new  
viral protein module.

Thanks,

Chris

-- 

Christopher Larsen, Ph.D.
Sr. Scientist / Grants Manager
Vecna Technologies
6404 Ivy Lane #500
Greenbelt, MD 20770
Phone: (240) 965-4525
Fax: (240) 547-6133
240-737-4525




More information about the Bioperl-l mailing list