[Biopython] Concatenate to aligned sequences

Peter Cock p.j.a.cock at googlemail.com
Thu Feb 14 17:29:12 UTC 2013


On Thu, Feb 14, 2013 at 5:20 PM, Vincent Davis <vincent at vincentdavis.net> wrote:
> I have 2 fasta files from a mucle alignment. Both have the same number of
> sequences from the same organism. If I what to concatenate the pairs of
> sequences what it the  best way to do this.
> Right now I am doing this:
>
> def concatenate(fa1, fa2):
>     fa1open = open(fa1, "rU")
>     fa2open = open(fa1, "rU")
>     fa1dict =  SeqIO.to_dict(SeqIO.parse(fa1open, "fasta"))
>     fa2dict =  SeqIO.to_dict(SeqIO.parse(fa2open, "fasta"))
>     fa1open.close()
>     fa2open.close()
>     # check that both files have the same sequnce id's
>     if set(fa1dict.keys()) != set(fa2dict.keys()):
>         print(fa1dict.keys(), fa2dict.keys())
>         print('The fasta files do not have the same sequences')
>     bothdict = {}
>     bothlist = []
>     count = 1
>     for key in fa2dict.keys():
>         bothdict[key] = fa2dict[key]
>         bothdict[key].seq = fa2dict[key].seq + fa1dict[key].seq
>         bothlist.append(bothdict[key])
>     return bothdict, bothlist
>
> Vincent Davis
> 720-301-3003

Have you tried loading the two alignment files via AlignIO,
sorting by name if required, and adding the alignment objects?

http://biopython.org/DIST/docs/api/Bio.Align.MultipleSeqAlignment-class.html#__add__

Peter



More information about the Biopython mailing list