[Biopython] Help modify this code so it can do what I want it to do

Peter Cock p.j.a.cock at googlemail.com
Wed Feb 5 16:07:22 UTC 2014


Hi Edson,

I can see where the problem stems from now - it did puzzle me for a while.
For this part to make sense, query and sbjct need to be the FULL sequence
of the query and the subject (as given to BLAST as input):

    complete_query_seq += str(query[q_start-1:q_end])
    complete_sbjct_seq += str(sbjct[sb_start-1:sb_end])

(I had assumed these variables were setup at the beginning of the file,
which I partly why I asked for the full script.)

However, via the for loop, you are using hsp.query, hsp.sbjct as query
and sbjct, This are the PARTIAL sequences aligned with gap characters.
This might do what you seemed to want:

    complete_query_seq += query.replace("-", "")
    complete_sbjct_seq += sbjct.replace("-", "")

However, this will concatenate the fragments with an HSP - any bit of
the query or subject which did not align will not be included. Any bit
which appears in more than one HSP will be there twice. And also
if you're using masking you'll have XXXXX X regions in the sequence
where the filter said it was low complexity.

I would instead get the original unmodified query/subject sequences
from the original FASTA files given to BLAST.

Peter


On Tue, Feb 4, 2014 at 9:12 AM, Edson Ishengoma
<ishengomae at nm-aist.ac.tz> wrote:
> Hi Peter,
>
> My apology, I have updated the code at
> https://gist.github.com/EBIshengoma/efc4ad3e32427891931d to appear exactly
> how I run it from my computer.
>
> Thanks.
>



More information about the Biopython mailing list