[Bioperl-l] Creating a MSA from a set of pairwise alignments with a common reference sequence?
Roy Chaudhuri
roy.chaudhuri at gmail.com
Thu Aug 20 16:42:23 UTC 2009
Hi Dan,
I think you want the Bio::LocatableSeq method "column_from_residue_number".
You might also try combining your pairwise alignments using the profile
alignment option in ClustalW.
Cheers.
Roy.
Dan Bolser wrote:
> Hi,
>
> Quick version: How do I get a column of Bio::SimpleAlign using
> ungapped 'reference' sequence coordinates?
>
>
>
> Longer version:
>
> I have a set of pairwise alignments that I would like to process into
> a 'multiple sequence alignment' (MSA). All the alignments are short
> sequence 'contigs' aligned to a 'reference' sequence, so one sequence
> in all the pairwise alignments is constant (making the resulting MSA
> unambiguous).
>
> I came up with the following pseudo-code to create a MSA
> (Bio::SimpleAlign) from the set of pairwise alignments...
>
> initialise:
> Create an 'empty' Bio::SimpleAlign from the REFERENCE sequence.
>
> for each pairwise alignment:
> Create a Bio::LocatableSeq from the given fragment of the
> REFERENCE sequence (using ungapped REFERENCE coordinates).
>
> for each gap in the REFERENCE sequence:
> Take the position of the gap (in ungapped REFERENCE
> coordinates) and look up the corresponding column of the MSA
> (in ungapped REFERENCE coordinates).
>
> for each sequence in the column:
> Check if there is a gap-character at this position.
>
> if any sequence has a non gap-character at this position:
> Stick a gap in the MSA just before this position.
>
> Create a Bio::LocatableSeq from the CONTIG sequence (using
> ungapped REFERENCE coordinates) and add it to the
> Bio::SimpleAlign.
>
> done.
>
>
> I would very much appreciate, 1) feedback on the correctness of the
> above algorithm (it could be horribly wrong), and 2) advice on how to
> get a column of the alignment using ungapped REFERENCE coordinates?
>
>
> Sorry if this is a solved problem (where is it solved?). If not, and
> if I can get it working, I'll try to write a generic function to merge
> two MSAs when they have a reference sequence in common.
>
>
> For your reference, the pairwise alignments come from the show-aligns
> command in the MUMmer sequence alignment package, and have the
> following format:
>
> my.reference.fasta my.contigs.multi.fasta
>
> ============================================================
> -- Alignments between REFERENCE and CONTIG00012
>
> -- BEGIN alignment [ +1 29237 - 45714 | +1 1 - 16441 ]
>
>
> 29237 aataacctctttaag.taatatttttctctggtcccaacttgcgccaat
> 1 aataa.ctctttaagataatatttttctctggtcccgacttgggccaat
> ^ ^ ^ ^
>
> 29286 ggaaaaaaatcacttattcgataa.ataataagataaatatattttcta
> 49 ggaaaaaaatcactatttcgataagataataagata.atatattttcaa
> ^^ ^ ^ ^
>
> 29335 aagacccctacataaatatatggtcccattaatattataaattaataat
> 97 aagacccctatataaatatatggtctcattaatattataaattaataat
> ^ ^
>
> ...
>
>
> For further reference:
>
> This thread:
> http://bioperl.org/pipermail/bioperl-l/2009-July/030643.html
>
> http://www.bioperl.org/wiki/Align_Refactor
>
> http://www.bioperl.org/wiki/Alignment_object
>
>
>
> All the best,
> Dan.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list