[BioPython] AlignIO: Sequences of different length

Peter biopython at maubp.freeserve.co.uk
Thu Dec 4 18:10:58 UTC 2008


On Thu, Dec 4, 2008 at 6:02 PM, João Rodrigues <anaryin at gmail.com> wrote:
> Well, bad news, I'd rather have it be a problem with my code :D No problem
> at all to include my output.

Thanks.  For anyone wanting to try this at home, working backwards
from the answer, the first input sequence is:

>E1
MSSDRQRSDDESPSTSSGSSDADQRDPAAPEPEEQEERKPSATQQKKNTKLSSKTTAKLS
TSAKRIQKELAEITLDPPPNCSAGPKGDNIYEWRSTILGPPGSVYEGGVFFLDITFSSDY
PFKPPKVTFRTRIYHCNINSQGVICLDILKDNWSPALTISKVLLSICSLLTDCNPADPLV
GSIATQYLTNRAEHDRIARQWTKRYAT

And the second:

>E2
GMSDDDSRASTSSSSSSSSNQQTEKETNTPKKKESKVSMSKNSKLLSTSAKRIQKELADI
TLDPPPNCSAGPKGDNIYEWRSTILGPPGSVYEGGVFFLDITFTPEYPFKPPKVTFRTRI
YHCNINSQGVICLDILKDNWSPALTISKVLLSICSLLTDCNPADPLVGSIATQYMTNRAE
HDRMARQWTKRYAT

I've assumed default needle parameters are being used.  Its the start
of the alignment which is causing the problem, i.e. this bit of your
file:

E1                 1 MSSDRQRSDDES-PSTSSGSSDADQRDPAAPEPEEQEERKPSATQQKKNT     49
                          ..|||:| .||||.||.:          ..|:..|.:.|.:||.:
E2                 1      GMSDDDSRASTSSSSSSS----------SNQQTEKETNTPKKKES     35

This is easier to see with a fixed width font, but compare it to what
I get using EMBOSS 6.0.1 on my local machine:

E1                 1 MSSDRQRSDDES-PSTSSGSSDADQRDPAAPEPEEQEERKPSATQQKKNT     49
                          ..|||:| .||||.||.:          ..|:..|.:.|.:||.:
E2                 1 -----GMSDDDSRASTSSSSSSS----------SNQQTEKETNTPKKKES     35

Note that here the second sequence, E2, has five leading gap
characters.  These are missing in your file, where spaces have been
used, and the Biopthon parser was not expecting this.

What URL are you using for the EMBOSS webservice?  I'd like to try
this myself, and if possible see what version of EMBOSS they are using
on the server.

Peter




More information about the Biopython mailing list