[BioPython] Random sequence

pan at uchicago.edu pan at uchicago.edu
Thu Jun 17 00:47:57 EDT 2004


(sorry, this is probably out of topic, but just for fun ...)

A non-random coding sequence with codon usage following a 
predefined codon usage frequency :

def codingSeq_codonUsage(size=30, sep='', codonUsage={}):
  return sep.join([ randomPickDict(codonUsage) for x in range(size/3) ])

where:

-- codonUsage is a dictionary looks like:
    { 'TTT':17.2, 'TCT':14.9,  'TAT':12.1, 'TAG':0, ...}
   see: http://www.kazusa.or.jp/codon/cgi-bin/showcodon.cgi?
species=Homo+sapiens+[gbpri]

-- randomPickDict() is from: 
   http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/278260
     
pan


Quoting pan at uchicago.edu:

> Also see a 2-liner coding sequence:
> 
> >>> def codingSeq(size=30, stopCodons=['TAG','TAA', 'TGA'], sep=''):
> 	codons = [x+y+z for x in 'AGTC' for y in 'AGTC' for z in 'AGTC' \
>                     if x+y+z not in stopCodons]
>  	return sep.join([ random.choice(codons) for x in range(size/3) ])
> 
> >>> codingSeq()
> 'AATGTTTCACTAGGTGACGTGTCGTGGCTA' 
> 
> >>> codingSeq(sep=' ')
> 'GGT GCT AAG TTC CGA TCG AAC AGA AAC TGT'
> 
> 
> 
> Quoting pan at uchicago.edu:
> 
> > You can make a random seq with one line of python code:
> > 
> > >>> import random
> > 
> > >>> ''.join([random.choice('AGTC') for x in range(10)]) 
> > 'GGTTTCGGTA'
> > 
> > >>> ''.join([random.choice('AGTC') for x in range(10)]) 
> > 'GCGGGTCCGT'
> > 
> > >>> ''.join([random.choice('AGTC') for x in range(10)]) 
> > 'AAAAGCACTG'
> > 
> > Isn't it beautiful? 
> > 
> > pan
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > Quoting ashleigh smythe <absmythe at ucdavis.edu>:
> > 
> > > On Wed, 2004-06-16 at 07:45, Sebastian Bassi wrote:
> > > > Is there a way to generate a random DNA sequence with biopython?
> > > > If not, I could submit a function to do it, but before doing it, I'd 
> > > > want to see if its not already done.
> > > 
> > > Hi Sebastian.  I wasn't able to find a random sequence generator in the
> > > biopython modules so I wrote a simple little one of my own a few months
> > > ago- it only uses biopython modules to add the sequence to a
> > > biopython-parsed file.  It is quite ugly and brute force as I'm a
> > > beginner - I'd be curious to see what you come up with.  In case you
> are
> > > curious, here it is:
> > > 
> > > #This is designed to generate random DNA sequence data and add
> > > #it to the end of a biopython-parsed sequence record
> > > #in fasta format.
> > > #Modified 2-20 to just make random seq. data for a taxon,
> > > #rather than adding it onto the existing sequence.
> > > 
> > > import random
> > > import string
> > >                                                                          
>  
> >  
> > >                                                                          
>  
> >  
> > >                                                                          
>  
> >  
> > >            
> > > def generate(n):                  #generate the dna sequence of n
> length
> > >     bases=['A', 'T', 'G', 'C']
> > >     dna_in_list=[]
> > >                                                                          
>  
> >  
> > >                                              
> > >     while n > 0:
> > >         abase=random.choice(bases)
> > >                                                                          
>  
> >  
> > >                                              
> > >         dna_in_list.append(abase)
> > >         n=n-1
> > >                                                                          
>  
> >  
> > >                                              
> > >     dnastring=str(dna_in_list)     #format the list into a string.  
> > >     better_dnastring=string.join(string.split(dnastring),"") #Take
> > >     better2_dnastring=string.strip(better_dnastring)         #out
> > >     better3_dnastring=better2_dnastring.replace(',','')      #unwanted
> > >     better4_dnastring=better3_dnastring.replace(']','')     
> #characters
> > >     better5_dnastring=better4_dnastring.replace('[','')
> > >     better6_dnastring=better5_dnastring.replace("'",'')
> > > 
> > >     return better6_dnastring
> > >    
> > >  
> > > def add_seq(n):                #this is how start 
> > >     import sys                 #the program:seqgen.add_seq(file, n).
> > >     from Bio import Fasta                                     
> > >     parser=Fasta.RecordParser()
> > >     afile=open(file_to_add_to, 'r')
> > >     iterator=Fasta.Iterator(afile, parser)
> > >  
> > >     out_file=open('randomadded.nex', 'w')
> > >  
> > >     while 1:                   #loop through each record and add the
> new
> > >         seq_to_add=generate(n) #sequence
> > >         cur_record=iterator.next()
> > >         if cur_record is None:
> > >             break
> > >         title_and_seq=string.split(cur_record.title)
> > >         title='>' + title_and_seq[0] + '\n'
> > >         new_record=title + 'N' + seq_to_add
> > >         out_file.write(new_record)
> > >         out_file.write('\n')
> > > 
> > > 
> > > Ashleigh
> > > 
> > > _______________________________________________
> > > BioPython mailing list  -  BioPython at biopython.org
> > > http://biopython.org/mailman/listinfo/biopython
> > > 
> > 
> > 
> > _______________________________________________
> > BioPython mailing list  -  BioPython at biopython.org
> > http://biopython.org/mailman/listinfo/biopython
> > 
> 
> 




More information about the BioPython mailing list