[BioPython] AlignIO: Sequences of different length

Peter biopython at maubp.freeserve.co.uk
Tue Dec 9 10:17:40 UTC 2008


On Mon, Dec 8, 2008 at 11:26 PM, João Rodrigues <anaryin at gmail.com> wrote:
> Well, as promised, here goes update. I didn't try with soaplab2 because it
> was too complicated to get it to work. I didn't want to lose more than 10
> minutes either so... However, with standalone needle, which EBI claim to be
> the same version as the soaplab2 service, it works flawlessly :)
>
> Code here: http://pastebin.com/f29ff12d6
>
> Console output here: http://pastebin.com/f5bbc5593
>
> It's not a bug then, it's just an old version :)

Well, arguably it would be nice Biopython could parse old versions of
the EMBOSS pairs/simple output too, but its not so important.

> Using the web versions, there may be some workarounds. If you convert
> the format to one of the others, you may get a usable one for Biopython.

If you just want the alignment itself, using FASTA as the output
format from needle is very simple.

e.g.

$ needle one.fasta two.fasta --auto --filter -aformat fasta
>E1
MSSDRQRSDDES-PSTSSGSSDADQRDPAAPEPEEQEERKPSATQQKKNTKLS-SKTTAK
LSTSAKRIQKELAEITLDPPPNCSAGPKGDNIYEWRSTILGPPGSVYEGGVFFLDITFSS
DYPFKPPKVTFRTRIYHCNINSQGVICLDILKDNWSPALTISKVLLSICSLLTDCNPADP
LVGSIATQYLTNRAEHDRIARQWTKRYAT
>E2
-----GMSDDDSRASTSSSSSSS----------SNQQTEKETNTPKKKESKVSMSKNSKL
LSTSAKRIQKELADITLDPPPNCSAGPKGDNIYEWRSTILGPPGSVYEGGVFFLDITFTP
EYPFKPPKVTFRTRIYHCNINSQGVICLDILKDNWSPALTISKVLLSICSLLTDCNPADP
LVGSIATQYMTNRAEHDRMARQWTKRYAT

> I tried markx1 I believe, and it was "almost" parsable, it just didn't get the
> correct sequences (if you deleted everything BUT the sequences, it would
> work).

How were you trying to parse the markx1 output?

Note that the EMBOSS markx10 output is similar to, but differs from,
the FASTA -m 10 output (which Biopython can parse as the "fasta-m10"
format in Bio.AlignIO).

> So, I think there should at least be a warning somewhere for the
> users so that they don't get nuts or reporting bugs :)

Do you mean a warning about trying to use Bio.AlignIO with the
"emboss" format to read output from old versions of EMBOSS needle
tool?

Peter




More information about the Biopython mailing list