[Bioperl-l] Polyproteins, ribo slippage, and mat_peptide in viruses?
Chris Larsen
clarsen at vecna.com
Tue Oct 27 16:07:55 EDT 2009
Peter,
This is a good strategy when the gi is given. However I failed to
mention that we are finding the example I gave is unusual (15%?)---
most virus 'mature peptides' we will apply this analysis to do not in
fact have a gi number or unique identifier associated with them. There
are thousands of dengue virus files to be processed to give mature
proteins.
Should have mentioned this...Hence the problem--we cant look it up
because only the parent polyprotein has a gi. Theres nothing to look
up /by/ in most cases. So we still have to build a set of proteins
that are cleaved out of every polyprotein, by local and high
throughput methods, by building it out of the available information
(sadly, kind of a run around-- it should be in the genbank entry).
Chris
On Oct 27, 2009, at 3:54 PM, Peter wrote:
> On Tue, Oct 27, 2009 at 7:15 PM, Chris Larsen <clarsen at vecna.com>
> wrote:
>>
>> Hello Peter!
>>
>> For instance, check this:
>> http://www.ncbi.nlm.nih.gov/nuccore/NC_001959
>> ...
>>
>> No mat_peptide sequence is given. We want that...
>
> Looking at the GenBank file displayed, the mat_peptide features
> (mature peptides) do not include a translation entry (like the parent
> CDS feature does). However, they do have protein IDs - which are
> actually links in the HTML version.
>
> This leads me to suggest a third option as an alternative to the two
> ideas you outlined. You could parse the GenBank file(s), and for each
> mat_peptide feature look up the protein ID via Entrez EFetch (e.g. as
> a FASTA file, or a GenPept file). If you only have a relatively small
> number of viruses and proteins this is probably going to be pretty
> easy. At least, I could do it in Biopython and I am sure the same is
> true with the BioPerl GenBank parser and their EFetch interface.
>
> However, for a large dataset, handling it all locally (your options
> (1) and (2) sound best).
>
> Peter
--
Christopher Larsen, Ph.D.
Sr. Scientist / Grants Manager
Vecna Technologies
6404 Ivy Lane #500
Greenbelt, MD 20770
Phone: (240) 965-4525
Fax: (240) 547-6133
240-737-4525
More information about the Bioperl-l
mailing list