[BioPython] ACE contig to alignment
David Winter
winda002 at student.otago.ac.nz
Tue Mar 3 22:03:36 UTC 2009
Hi all,
I'd like to start by thanking everyone that's contributed to biopython
and especially the cookbook/tutorial - its been a great help to this
empiricist getting into some (decidedly amateur) bioinformatics.However,
for the first time I've run into a problem the available docs can't help
me with.
I want to be able to represent all of the reads that contribute to a 454
sequencing contig as a generic biopython alignment. I've written some
code that I thought would pad/cut the reads to size and add them to an
alignment but when I run it a significant minority of the contigs in the
files I'm working with have misalignments. I was wondering if someone
more familiar with the ace parser or generic alignment class could tell
me if I'm making some elementary mistake (it is possible that original
alignment was bad, just seems more likely I did something dumb). I can
send along an ACE file if you want to run the script (didn't want to
spam the list with attachments).
Thanks in advance for any pointers and I'm sorry to force people to read
what I'm sure is inelegant code:
from Bio.Sequencing import Ace
from Bio.Align.Generic import Alignment
from Bio.Alphabet import IUPAC, Gapped
ace_handle = open('eldoni.ace', 'r')
contigs = Ace.parse(ace_handle)
alignments = [] #start the list to which we'll add the contig data
for contig in contigs:
conname = contig.name + " numreads=" + str(contig.nreads)
conlength = len(contig.sequence)
align = Alignment(Gapped(IUPAC.ambiguous_dna, "*"))
for readn in range(len(contig.reads)):
start = contig.af[readn].padded_start # position rel to consensus
if start < 1:
# If 'start' is negative or zero we need to ignore bases
readseq = contig.reads[readn].rd.sequence[-1 * start+1:]
else:
# If it's larger then the start needs to be padded with gaps
readseq = (start-1) * '*' + contig.reads[readn].rd.sequence
#Finally, pad the end then cut to size
readseq = readseq + (conlength-len(readseq)) * '*'
readseq = readseq[:conlength]
align.add_sequence(readn+1, readseq)
condata = conname, align
alignments.append(condata)
--
PhD Student
Allan Wilson Centre
Department of Zoology
University of Otago, PO Box 56, Dunedin 9054
p: +64-3-4798459
m: +64-27-3326815
e: winda002 at student.otago.ac.nz
More information about the Biopython
mailing list