[Biopython] SIBsim4 alignment support

Peter biopython at maubp.freeserve.co.uk
Tue May 4 13:27:28 UTC 2010


On Tue, May 4, 2010 at 1:27 PM, Martin Mokrejs
<mmokrejs at ribosome.natur.cuni.cz> wrote:
> Hi,
>  I wonder whether there is anybody having time to write a parser for the
> output of:
> SIBsim4 -A 4 chr.fasta spliced_mRNA.fasta
> SIBsim4 -A 4 chr.fasta spliced_mRNA_rc.fasta
> SIBsim4 -A 4 chr_rc.fasta spliced_mRNA.fasta
> SIBsim4 -A 4 chr_rc.fasta spliced_mRNA_rc.fasta
>
> ...
>
> You can get it from http://sibsim4.sourceforge.net/ . This is a nice program
> to inspect exon/intron boundaries and I would like to get the sequences of
> the individual HSPs corresponding to the exons but fixed by the genomic
> sequence. SIBsim4 does not print out number of identities/similarities
> within each HSP but that would be the next I would do in python. ;)
>
>  I could probably go and write the parser but would need some time to
> learn the structure of Bio.AlignIO code ... and from a quick glance over
> Bio/AlignIO/FastaIO.py I am not sure how much time I would need. ;)

Looking at the FASTA m10 alignment parser is sensible in that it is another
pairwise alignment format - but it isn't the nicest parser in the world.

How much of the data do you actually care about? Just the pairwise
alignment (two sequences)? Right now annotation support is limited
in the alignment object - but this is something I am working on (but
not likely to be in the imminent Biopython 1.54 release).

Related to the above, which of the output formats are you planning to
support? http://sibsim4.sourceforge.net/manpage.html

Peter




More information about the Biopython mailing list