[Biopython] Question: Searching for RNA motifs with PSSM

Peter Cock p.j.a.cock at googlemail.com
Fri Dec 15 07:05:00 EST 2023


See also this discussion for proteins:

https://github.com/biopython/biopython/issues/3636

Having the code handle RNA or DNA looks very straightforward in comparison
(e.g. treating U and u the same as T and t in the C code).

Peter

On Fri, Dec 15, 2023 at 10:33 AM Váczy-Földi Máté <
vaczy.foldi.mate at semmelweis.hu> wrote:

> Dear Peter,
>
>
> Thank thank you very much for the suggestion! I will go ahead with this
> method.
>
>
> Best wishes,
>
> Máté
> 2023. 12. 15. 10:16 keltezéssel, Peter Cock írta:
>
> Hello Máté,
>
> I see you are referring to the PositionSpecificScoringMatrix class
> defined in Bio/motifs/matrix.py which does indeed appear to be DNA
> only. I can't comment on any drawbacks in generalizing that code
> (I can see how I would attempt this), but your first idea is what I would
> have suggested in the short term - map any U to T in your motifs and
> sequences to be searched (i.e. treat as DNA).
>
> Peter
>
> On Thu, Dec 14, 2023 at 4:18 PM Váczy-Földi Máté <
> vaczy.foldi.mate at semmelweis.hu> wrote:
>
>> Dear Mailing List Members,
>>
>>
>> I would like to ask a question related to the Bio.motifs package.
>>
>> I am currently working a project where I need to find RNA motifs in RNA
>> sequences. After consideration we have decided to search for the motif
>> occurrences using PSSMs, and I would like to implement this using
>> Biopython. I looked at the relevant codes in the in the matrix.py file and
>> I have seen that the PSSM calculate method is hard coded to work only with
>> DNA. There is also a note saying "the sequence can only be a DNA sequence".
>>
>> My question is that:
>>
>>    1. Would it be safe to replace all Us with Ts in the sequences/PSSMs
>>    and run the search that way? (I have seen one example of someone doing this
>>    while searching.)
>>    2. Or would it be possible for me to modify the code to work with RNA
>>    by replacing the Ts with Us in the code (or in a more sophisticated way
>>    providing an option for both)?
>>
>> For the latter I understand that I have to modify the _pwm.c code too. I
>> am not experienced in C, but what I gathered by looking at that code, it
>> should not be a big problem.
>>
>> I am just looking for some confirmation that I am not overlooking some
>> computational or biology related reason why the above mentioned solutions
>> are not possible.
>>
>>
>> Thank you in advance for your kind help!
>>
>>
>> Best wishes,
>>
>> Máté Váczy-Földi
>> _______________________________________________
>> Biopython mailing list  -  Biopython at biopython.org
>> https://mailman.open-bio.org/mailman/listinfo/biopython
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20231215/b0f34814/attachment-0001.htm>


More information about the Biopython mailing list