[Bioperl-l] Polyproteins, ribo slippage, and mat_peptide in viruses?

Chris Larsen clarsen at vecna.com
Tue Oct 27 20:07:55 UTC 2009


Peter,

This is a good strategy when the gi is given. However I failed to  
mention that we are finding the example I gave is unusual (15%?)--- 
most virus 'mature peptides' we will apply this analysis to do not in  
fact have a gi number or unique identifier associated with them. There  
are thousands of dengue virus files to be processed to give mature  
proteins.

Should have mentioned this...Hence the problem--we cant look it up  
because only the parent polyprotein has a gi. Theres nothing to look  
up /by/ in most cases. So we still have to build a set of proteins  
that are cleaved out of every polyprotein, by local and high  
throughput methods, by building it out of the available information  
(sadly, kind of a run around-- it should be in the genbank entry).

Chris


On Oct 27, 2009, at 3:54 PM, Peter wrote:

> On Tue, Oct 27, 2009 at 7:15 PM, Chris Larsen <clarsen at vecna.com>  
> wrote:
>>
>> Hello Peter!
>>
>> For instance, check this:
>> http://www.ncbi.nlm.nih.gov/nuccore/NC_001959
>> ...
>>
>> No mat_peptide sequence is given. We want that...
>
> Looking at the GenBank file displayed, the mat_peptide features
> (mature peptides) do not include a translation entry (like the parent
> CDS feature does). However, they do have protein IDs - which are
> actually links in the HTML version.
>
> This leads me to suggest a third option as an alternative to the two
> ideas you outlined. You could parse the GenBank file(s), and for each
> mat_peptide feature look up the protein ID via Entrez EFetch (e.g. as
> a FASTA file, or a GenPept file). If you only have a relatively small
> number of viruses and proteins this is probably going to be pretty
> easy. At least, I could do it in Biopython and I am sure the same is
> true with the BioPerl GenBank parser and their EFetch interface.
>
> However, for a large dataset, handling it all locally (your options
> (1) and (2) sound best).
>
> Peter


-- 

Christopher Larsen, Ph.D.
Sr. Scientist / Grants Manager
Vecna Technologies
6404 Ivy Lane #500
Greenbelt, MD 20770
Phone: (240) 965-4525
Fax: (240) 547-6133
240-737-4525




More information about the Bioperl-l mailing list