[Biopython-dev] 5/4 active questions tagged biopython - Stack Overflow

Feed My Inbox updates at feedmyinbox.com
Wed May 4 04:37:19 UTC 2011


// Finding/Replacing substrings with annotations in an ASCII file in Python
// May 3, 2011 at 9:14 AM

http://stackoverflow.com/questions/5870012/finding-replacing-substrings-with-annotations-in-an-ascii-file-in-python
Hello Everyone,

I'm having a little coding issue in a bioinformatics project I'm working on. Basically, my task is to extract motif sequences from a database and use the information to annotate a sequence alignment file. The alignment file is plain text, so the annotation will not be anything elaborate, at best simply replacing the extracted sequences with asterisks in the alignment file itself. 

I have a script which scans the database file, extracts all sequences I need, and writes them to an output file. What I need is, given a query, to read these sequences and match them to their corresponding substrings in the ASCII alignment files.  Finally, for every occurrence of a motif sequence (substring of a very large string of characters) I would replace motif sequence XXXXXXX with a sequence of asterisks *. 

The code I am using goes like this (11SGLOBULIN is the name of the protein entry in the database):

motif_file = open('/users/myfolder/final motifs_11SGLOBULIN','r')
align_file = open('/Users/myfolder/alignmentfiles/11sglobulin.seqs', 'w+') 
finalmotifs = motif_file.readlines()
seqalign = align_file.readlines() 


for line in seqalign:
    if motif[i] in seqalign:  # I have stored all motifs in a list called "motif"
        replace(motif, '*****') 


But instead of replacing each string with a sequence of asterisks, it deletes the entire file. Can anyone see why this is happening? 

I suspect that the problem may lie in the fact that my ASCII file is basically just one very long list of amino acids, and Python cannot know how to replace a particular substring hidden within a very long string.


--
Website: http://stackoverflow.com/questions/tagged/?tagnames=biopython&sort=active

Account Login: 
https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email

Unsubscribe here: 
http://www.feedmyinbox.com/feeds/unsubscribe/630208/9a33fac9c8e89861715f609a2333362c8425e495/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email

--
This email was carefully delivered by FeedMyInbox.com. 
PO Box 682532 Franklin, TN 37068




More information about the Biopython-dev mailing list