[BioPython] AlignIO: Sequences of different length
Peter
biopython at maubp.freeserve.co.uk
Thu Dec 4 18:10:58 UTC 2008
On Thu, Dec 4, 2008 at 6:02 PM, João Rodrigues <anaryin at gmail.com> wrote:
> Well, bad news, I'd rather have it be a problem with my code :D No problem
> at all to include my output.
Thanks. For anyone wanting to try this at home, working backwards
from the answer, the first input sequence is:
>E1
MSSDRQRSDDESPSTSSGSSDADQRDPAAPEPEEQEERKPSATQQKKNTKLSSKTTAKLS
TSAKRIQKELAEITLDPPPNCSAGPKGDNIYEWRSTILGPPGSVYEGGVFFLDITFSSDY
PFKPPKVTFRTRIYHCNINSQGVICLDILKDNWSPALTISKVLLSICSLLTDCNPADPLV
GSIATQYLTNRAEHDRIARQWTKRYAT
And the second:
>E2
GMSDDDSRASTSSSSSSSSNQQTEKETNTPKKKESKVSMSKNSKLLSTSAKRIQKELADI
TLDPPPNCSAGPKGDNIYEWRSTILGPPGSVYEGGVFFLDITFTPEYPFKPPKVTFRTRI
YHCNINSQGVICLDILKDNWSPALTISKVLLSICSLLTDCNPADPLVGSIATQYMTNRAE
HDRMARQWTKRYAT
I've assumed default needle parameters are being used. Its the start
of the alignment which is causing the problem, i.e. this bit of your
file:
E1 1 MSSDRQRSDDES-PSTSSGSSDADQRDPAAPEPEEQEERKPSATQQKKNT 49
..|||:| .||||.||.: ..|:..|.:.|.:||.:
E2 1 GMSDDDSRASTSSSSSSS----------SNQQTEKETNTPKKKES 35
This is easier to see with a fixed width font, but compare it to what
I get using EMBOSS 6.0.1 on my local machine:
E1 1 MSSDRQRSDDES-PSTSSGSSDADQRDPAAPEPEEQEERKPSATQQKKNT 49
..|||:| .||||.||.: ..|:..|.:.|.:||.:
E2 1 -----GMSDDDSRASTSSSSSSS----------SNQQTEKETNTPKKKES 35
Note that here the second sequence, E2, has five leading gap
characters. These are missing in your file, where spaces have been
used, and the Biopthon parser was not expecting this.
What URL are you using for the EMBOSS webservice? I'd like to try
this myself, and if possible see what version of EMBOSS they are using
on the server.
Peter
More information about the Biopython
mailing list