[Biopython-dev] 4/11 active questions tagged biopython - Stack Overflow

Feed My Inbox updates at feedmyinbox.com
Mon Apr 11 04:42:29 UTC 2011


// Script for parsing a biological sequence from a public database in Python
// April 10, 2011 at 3:33 PM

http://stackoverflow.com/questions/5614180/script-for-parsing-a-biological-sequence-from-a-public-database-in-python
Greetings to the stackoverflow community,

I am currently following a bioinformatics module as part of a biomedical degree (I am basically a python newbie) and the following task is required as part of a Python programming assignment: 

extract motif sequences (amino acid sequences, so basically strings in programmatic-speak, that have been excised from algorithms implementing a multiple sequence alignment and subsequently iterative database scanning to generate the best conserved sequences. The ultimate idea is to infer functional significance from such "motifs"). 

These motifs are stored on a public database in files which have multiple data fields corresponding to each protein (uniprot ID, Accession Number, the alignment itself stored in a hyperlink .seq file), currently one of which is of interest in this scope. The data field is called "extracted motif sets".  My question is how to go about writing a script that will essentially parse the "motif strings" and output them to a file.  Very crudely, I believe the pseudocode would run something along these lines:

motif_file = open("lysozyme.seq") # database files are saved as .seq
for "extracted motif sets" in motif_file
# motif extracting code goes here
motif_file.close()


There are well over a couple hundred .seq files, so I imagine a looping technique would be needed to scan a good part of the database.

The excised motifs are on a file separated from the .seq by a hyperlink, but are of course subsequences of each of the entries in the Mult. Seq. Alignment files (.seq). Is there a way to effectively pick up the motif sequences (strings) from one file and do some mining on the alignment files to detect the same sequence? 

I apologize for the extensive text, I just hope to be a clear as possible. Thanks in advance for any help!


--
Website: http://stackoverflow.com/questions/tagged/?tagnames=biopython&sort=active

Account Login: 
https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email

Unsubscribe here: 
http://www.feedmyinbox.com/feeds/unsubscribe/630208/9a33fac9c8e89861715f609a2333362c8425e495/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email

--
This email was carefully delivered by FeedMyInbox.com. 
PO Box 682532 Franklin, TN 37068




More information about the Biopython-dev mailing list