[Biopython] Concatenate to aligned sequences

Vincent Davis vincent at vincentdavis.net
Thu Feb 14 17:20:58 UTC 2013


I have 2 fasta files from a mucle alignment. Both have the same number of
sequences from the same organism. If I what to concatenate the pairs of
sequences what it the  best way to do this.
Right now I am doing this:

def concatenate(fa1, fa2):
    fa1open = open(fa1, "rU")
    fa2open = open(fa1, "rU")
    fa1dict =  SeqIO.to_dict(SeqIO.parse(fa1open, "fasta"))
    fa2dict =  SeqIO.to_dict(SeqIO.parse(fa2open, "fasta"))
    fa1open.close()
    fa2open.close()
    # check that both files have the same sequnce id's
    if set(fa1dict.keys()) != set(fa2dict.keys()):
        print(fa1dict.keys(), fa2dict.keys())
        print('The fasta files do not have the same sequences')
    bothdict = {}
    bothlist = []
    count = 1
    for key in fa2dict.keys():
        bothdict[key] = fa2dict[key]
        bothdict[key].seq = fa2dict[key].seq + fa1dict[key].seq
        bothlist.append(bothdict[key])
    return bothdict, bothlist

Vincent Davis
720-301-3003



More information about the Biopython mailing list