[BioPython] Random sequence

ashleigh smythe absmythe at ucdavis.edu
Wed Jun 16 14:19:36 EDT 2004


On Wed, 2004-06-16 at 07:45, Sebastian Bassi wrote:
> Is there a way to generate a random DNA sequence with biopython?
> If not, I could submit a function to do it, but before doing it, I'd 
> want to see if its not already done.

Hi Sebastian.  I wasn't able to find a random sequence generator in the
biopython modules so I wrote a simple little one of my own a few months
ago- it only uses biopython modules to add the sequence to a
biopython-parsed file.  It is quite ugly and brute force as I'm a
beginner - I'd be curious to see what you come up with.  In case you are
curious, here it is:

#This is designed to generate random DNA sequence data and add
#it to the end of a biopython-parsed sequence record
#in fasta format.
#Modified 2-20 to just make random seq. data for a taxon,
#rather than adding it onto the existing sequence.

import random
import string
                                                                                                                                                                                                                                                     
def generate(n):                  #generate the dna sequence of n length
    bases=['A', 'T', 'G', 'C']
    dna_in_list=[]
                                                                                                                           
    while n > 0:
        abase=random.choice(bases)
                                                                                                                           
        dna_in_list.append(abase)
        n=n-1
                                                                                                                           
    dnastring=str(dna_in_list)     #format the list into a string.  
    better_dnastring=string.join(string.split(dnastring),"") #Take
    better2_dnastring=string.strip(better_dnastring)         #out
    better3_dnastring=better2_dnastring.replace(',','')      #unwanted
    better4_dnastring=better3_dnastring.replace(']','')      #characters
    better5_dnastring=better4_dnastring.replace('[','')
    better6_dnastring=better5_dnastring.replace("'",'')

    return better6_dnastring
   
 
def add_seq(n):                #this is how start 
    import sys                 #the program:seqgen.add_seq(file, n).
    from Bio import Fasta                                     
    parser=Fasta.RecordParser()
    afile=open(file_to_add_to, 'r')
    iterator=Fasta.Iterator(afile, parser)
 
    out_file=open('randomadded.nex', 'w')
 
    while 1:                   #loop through each record and add the new
        seq_to_add=generate(n) #sequence
        cur_record=iterator.next()
        if cur_record is None:
            break
        title_and_seq=string.split(cur_record.title)
        title='>' + title_and_seq[0] + '\n'
        new_record=title + 'N' + seq_to_add
        out_file.write(new_record)
        out_file.write('\n')


Ashleigh



More information about the BioPython mailing list