[Bioperl-l] question about positioning peptide in a full protein sequence

Mingwei Min mm809 at cam.ac.uk
Mon Feb 21 20:23:20 UTC 2011


Hi Frank,

Yes, this is MS data for phosphorylation sites, as well as
ubiquitinlytion sites. I am now dealing with some published data---
some of which only show the peptide(and yes, some with the modifying
probabilities) without coordinates in the sequence. The way Chris
suggested is already very straightforward. I'll have a try. Thank you
very much for your help.

Cheer,

Mingwei

2011/2/21 Frank Schwach <fs5 at sanger.ac.uk>:
> Hi Mingwei,
>
> I guess this is MS data for phosphorylation sites? We are doing the same
> here. I don't know what software you are using in yuor MS pipeline but
> it may already map the peptides to the full-length protein for you. If
> not, you probably get peptide sequences with the probabilities of a site
> carrying a phosphate (or whatever post-translational modification)
> encoded in the string, e.g the data I'm working with will show me
> something like "..LKS[0.99]S[0.01]..." to indicate probabilities of 99%
> and 1% of those two serines being modified. You then have to extract
> that data from the peptide string using a regex. Then you can identifiy
> the most probable site within the string and map the peptide string to
> the full-length protein sequence using index (or a regex) as Chris
> suggested. You can then calculate the position of the actual modified
> site from the match position of the peptide and the position of the site
> within the peptide. I don't think there is any ready-made solution of
> this as it is basically just simply string-matching but please do let me
> knof if you are getting stuck and I can help you further.
>
> Cheers,
>
> Frank
>
>
>
> On Sun, 2011-02-20 at 20:57 -0600, Chris Fields wrote:
>> If this is a direct string match (no ambiguity), just use perl's index function:
>>
>>        index STR,SUBSTR,POSITION
>>        index STR,SUBSTR
>>                The index function searches for one string within another, but
>>                without the wildcard-like behavior of a full regular-expression
>>                pattern match.  It returns the position of the first occurrence
>>                of SUBSTR in STR at or after POSITION.  If POSITION is omitted,
>>                starts searching from the beginning of the string.  POSITION
>>                before the beginning of the string or after its end is treated
>>                as if it were the beginning or the end, respectively.  POSITION
>>                and the return value are based at 0 (or whatever you've set the
>>                $[ variable to--but don't do that).  If the substring is not
>>                found, "index" returns one less than the base, ordinarily "-1".
>>
>> Also see here:
>>
>> http://perlmeme.org/howtos/perlfunc/index_function.html
>>
>> chris
>>
>> On Feb 20, 2011, at 4:28 PM, Mingwei Min wrote:
>>
>> > Hi Dave,
>> >
>> > Thank you for your suggestion. when I said "too comple for this simple
>> > job", I just thought that there might be some particular module that
>> > could do this straightforwardly. I'll have a try of BLAST anyway.
>> > Thank you.
>> >
>> > Mingwei
>> >
>> > 2011/2/20 Dave Messina <David.Messina at sbc.su.se>:
>> >> Hi Mingwei,
>> >> Please remember to "reply all" so others on the mailing list can follow the
>> >> conversation.
>> >> Unless you have some way of other way of mapping the coordinates of the
>> >> sequence with the post-translational sites to the coordinates of the full
>> >> sequence, I think you'll have to do a similarity search of some form.
>> >> BLAST may not be best for this, given that it's sloppy with the ends of an
>> >> alignment, but there are plenty of options for BLAST that may improve your
>> >> results. Again, you'll need to be specific about your problem for us to
>> >> help. I don't what "too complex for this simple job" means. Is it too slow?
>> >> Are you getting too many hits?
>> >>
>> >>
>> >> Dave
>> >>
>> >>
>> >> On Sun, Feb 20, 2011 at 22:35, Mingwei Min <mm809 at cam.ac.uk> wrote:
>> >>>
>> >>> Hi Dave,
>> >>>
>> >>> Sorry for not making it clear. Yes, I just want to get the coordinates
>> >>> of the post-translational sites out of a protein sequence. And what I
>> >>> have now is the peptide sequence with marker on the post-translated
>> >>> residue... what should i do to map them to the whole protein sequence
>> >>> and get the coordinates? The only way I could come up with is blast.
>> >>> But it seems to be too complex for this simple job....
>> >>>
>> >>> Many thanks,
>> >>>
>> >>> Mingwei
>> >>>
>> >>> 2011/2/20 Dave Messina <David.Messina at sbc.su.se>:
>> >>>> Hi Mingwei,
>> >>>> I'm not sure what you mean by "positioning" here. Do you want to get the
>> >>>> coordinates of the post-translational sites out of a protein sequence
>> >>>> database record? Or do you want to draw the post-translational sites on
>> >>>> a
>> >>>> picture of the protein sequence? Or something else entirely?
>> >>>>
>> >>>> Dave
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Sat, Feb 19, 2011 at 15:53, Mingwei Min <mm809 at cam.ac.uk> wrote:
>> >>>>>
>> >>>>> Hi,
>> >>>>>
>> >>>>> I am trying to positioning some post-tranlational modification sites,
>> >>>>> which is marked in peptides, in a full length protein sequence. Anyone
>> >>>>> would be kind to tell me the model I could use for this?
>> >>>>>
>> >>>>> Many thanks
>> >>>>>
>> >>>>> Mingwei
>> >>>>> _______________________________________________
>> >>>>> Bioperl-l mailing list
>> >>>>> Bioperl-l at lists.open-bio.org
>> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >>>>
>> >>>>
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Mingwei Min  PhD student
>> > University of Cambridge
>> > Department of Genetics
>> > Downing St
>> > CB2 3EH
>> > UK
>> >
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
> --
>  The Wellcome Trust Sanger Institute is operated by Genome Research
>  Limited, a charity registered in England with number 1021457 and a
>  company registered in England with number 2742969, whose registered
>  office is 215 Euston Road, London, NW1 2BE.




More information about the Bioperl-l mailing list