[Biopython-dev] 4/11 active questions tagged biopython - Stack Overflow
Feed My Inbox
updates at feedmyinbox.com
Mon Apr 11 04:42:29 UTC 2011
// Script for parsing a biological sequence from a public database in Python
// April 10, 2011 at 3:33 PM
http://stackoverflow.com/questions/5614180/script-for-parsing-a-biological-sequence-from-a-public-database-in-python
Greetings to the stackoverflow community,
I am currently following a bioinformatics module as part of a biomedical degree (I am basically a python newbie) and the following task is required as part of a Python programming assignment:
extract motif sequences (amino acid sequences, so basically strings in programmatic-speak, that have been excised from algorithms implementing a multiple sequence alignment and subsequently iterative database scanning to generate the best conserved sequences. The ultimate idea is to infer functional significance from such "motifs").
These motifs are stored on a public database in files which have multiple data fields corresponding to each protein (uniprot ID, Accession Number, the alignment itself stored in a hyperlink .seq file), currently one of which is of interest in this scope. The data field is called "extracted motif sets". My question is how to go about writing a script that will essentially parse the "motif strings" and output them to a file. Very crudely, I believe the pseudocode would run something along these lines:
motif_file = open("lysozyme.seq") # database files are saved as .seq
for "extracted motif sets" in motif_file
# motif extracting code goes here
motif_file.close()
There are well over a couple hundred .seq files, so I imagine a looping technique would be needed to scan a good part of the database.
The excised motifs are on a file separated from the .seq by a hyperlink, but are of course subsequences of each of the entries in the Mult. Seq. Alignment files (.seq). Is there a way to effectively pick up the motif sequences (strings) from one file and do some mining on the alignment files to detect the same sequence?
I apologize for the extensive text, I just hope to be a clear as possible. Thanks in advance for any help!
--
Website: http://stackoverflow.com/questions/tagged/?tagnames=biopython&sort=active
Account Login:
https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email
Unsubscribe here:
http://www.feedmyinbox.com/feeds/unsubscribe/630208/9a33fac9c8e89861715f609a2333362c8425e495/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email
--
This email was carefully delivered by FeedMyInbox.com.
PO Box 682532 Franklin, TN 37068
More information about the Biopython-dev
mailing list