[Bioperl-l] Parsing CAP3 output to Fasta

Baik, Ki ki.baik at roche.com
Thu Dec 20 00:58:42 UTC 2007


Hello,

 

I'm interested in parsing the output of the CAP contig assembly program
into a format that is more manageable. The CAP output is shown below:

 

                .    :    .    :    .    :    .    :    .    :    .    :

Seq1+       CTGGATGGGTTAATTTACTCCCATAAGATTTTTGAAATCCTTAATTTACTGATATATCAC

            ____________________________________________________________

consensus   CTGGATGGGTTAATTTACTCCCATAAGAGAGCAGAAATCCTGGATCTCTGGATATATCAC

 

                .    :    .    :    .    :    .    :    .    :    .    :

Seq1+       ACTCTTAATTTACTCCCTGATTGG--CAGTGTTACACACCGGGACCAGGACCTAGATTCC

Seq2+       ACTCAGGGATTCTTCCCTGATTGGTTCAGTGTTACACTTTTGCGCCAGGACCTAGATTCC

            ____________________________________________________________

consensus   ACTCAGGGATTCTTCCCTGATTGGTTCAGTGTTACACACCGGGACCAGGACCTAGATTCC

 

                .    :    .    :    .    :    .    :    .    :    .    :

Seq1+       CACTGACATTTGGATGGTTAATTTACTCTTTTCCAGTGTCAGCAGAAGAGCGGGGGAGAC

Seq2+       CACTGACATTTGGATGGTTGTTTAAACTGGTACCAGTGTCCGCTCGCGGGGCAGAGAGAC

            ____________________________________________________________

consensus   CACTGACATTTGGATGGTTGTTTAAACTGGTACCAGTGTCAGCAGAAGAGGCAGAGAGAC

 

                .    :    .    :    .    :    .    :    .    :    .    :

Seq1+       TGGGTAATACAAACACTTTTCGGCGGCTTCTACATCCAGCTTGTTAATTTACTCTTTAGG

Seq2+       TGGGTAATACAAATGAAGATGTTTCCGGCCTACATCCAGCTTGTAATCATGC

            ____________________________________________________________

consensus   TGGGTAATACAAATGAAGATGCTAGTCTTCTACATCCAGCTTGTAATCATGGAGCTGAGG

 

I would like to maintain the alignment with their base positions for
each sequence. A fasta format retaining the alignment position is ideal
such as below:

 

>Seq1+

CTGGATGGGTTAATTTACTCCCATAAGATTTTTGAAATCCTTAATTTACTGATATATCAC

ACTCTTAATTTACTCCCTGATTGG--CAGTGTTACACACCGGGACCAGGACCTAGATTCC

CACTGACATTTGGATGGTTAATTTACTCTTTTCCAGTGTCAGCAGAAGAGCGGGGGAGAC

TGGGTAATACAAACACTTTTCGGCGGCTTCTACATCCAGCTTGTTAATTTACTCTTTAGG

>Seq2+

------------------------------------------------------------

ACTCAGGGATTCTTCCCTGATTGGTTCAGTGTTACACTTTTGCGCCAGGACCTAGATTCC

CACTGACATTTGGATGGTTGTTTAAACTGGTACCAGTGTCCGCTCGCGGGGCAGAGAGAC

TGGGTAATACAAATGAAGATGTTTCCGGCCTACATCCAGCTTGTAATCATGC--------

 

 

Does anyone have any experience doing this?

 

Regards,

 

KB





More information about the Bioperl-l mailing list