[BioPython] alignment processing

cgw501 at york.ac.uk cgw501 at york.ac.uk
Tue May 17 15:07:03 EDT 2005


Hi,

I have a file processing task I'm trying to do with biopython. I have to 
take a bunch of clustal alignment files that cover one arm of a whole 
chromosome, strip off the lowercase letters at the end of each sequence, 
and produce a file containing all the stripped sequences together is fasta 
format. This is what I have so far:

import Bio.Clustalw
from Bio.Alphabet import IUPAC
import string
from Bio.Seq import Seq
from Bio.SeqIO import FASTA
from Bio.SeqRecord import SeqRecord
from sys import *
import sys

inputs = sys.argv[1:-2]
output = open(sys.argv[-1], 'w')


for f in inputs:
    align = Bio.Clustalw.parse_file(f, alphabet=IUPAC.ambiguous_dna)
    lines = align.get_all_seqs()

    strippedAlignRecord = []
    for line in lines:
        lineSeq = line.seq
        lineString = lineSeq.tostring()
        strippedSeq = lineString.rstrip('atcg-')
        strippedSeqObj = Seq(strippedSeq, IUPAC.ambiguous_dna)
        strippedRecObj = SeqRecord(strippedSeqObj, id = line.description)
        out = FASTA.FastaWriter(output)
        out.write(strippedRecObj)

When I run this from the command line I don't get any errors, but the 
outfile is not created. I'm a bit flummoxed. Any ideas?

Thanks,

Chris


More information about the BioPython mailing list