[Biopython] Parsing through multiple DNA sequences one nucleotide at a time

Tue Dec 15 01:45:22 UTC 2015

Hello everyone,

  I have multiple fasta sequences that have been aligned and cut to the
same length.  I want to go through each nucleotide position one at a time
to see if it is the same for all of the sequences.  I have written a script
that can do one nucleotide position at a time but I can't figure out how to
get it to loop through the entire length of the sequence.

name = "test.fas"
handle = open(name, 'r')
conout = open("consensus.txt", 'a')

val = []
from Bio import SeqIO
pos = 45                                     #test position
seqlength = 1542

for seq_record in SeqIO.parse(handle, "fasta"): #parses fasta file
#    seqlength = len(seq_record.seq)
    val.append(seq_record.seq[pos])             #creates dictionary with
key = sequence#, value = nucleotide at position (pos)
length = len(val)                               #determines the total
number of key/value pairs

y=0
z=1

for x in val:                                   #parses through position
values
    if val[y] == val[z]:                        #checks to see if adjacent
values are equal
#        print val[z]
        z=z+1
        if z == length:                         #if all values are the
same, writes value (nucleotide) to file
            print "position " + str(pos+1) + " equals " + val[y]
            conout.write(val[y])
            break
    else:                                       #if all values are not the
same, writes newline to file
        print "position " + str(pos+1) + " does not have a common
nucleotide"
        conout.write('\n')
        break

  The way I have written the script, if all of the nucleotides in the same
position are the same it will write the nucleotide to a file.  If they are
not it will write a newline.  What I want is for the script to go through
the length of the DNA sequence (1542 bp) and write this information to a
text file so that I will end up with essentially all of the consensus
sequences that I can then check for potential primer locations.

When I try to put in a for or while loop I end up getting the first
nucleotide repeated for each position.  I think I just need to clear/reset
the val dictionary after each run but it doesn't seem to work.

  Any help would be greatly appreciated.

  Damian

-- 
Damian Menning, Ph.D.

"There are two types of academics. Those who use the Oxford comma, those
who don't and those who should."

Standard comma - You know Bob, Sue and Greg? They came to my house.
Oxford comma - You know Bob, Sue, and Greg? They came to my house.
Walken Comma - You know, Bob, Sue, and Greg? They came, to my house.
Shatner comma - You, know, Bob, Sue, and Greg? They, came, to my house.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20151214/81dc11c9/attachment.html>