[Biopython] Help modify this code so it can do what I want it to do

Wed Feb 5 17:52:17 UTC 2014

Hi Peter,

Woow, that made my day. Thank you very much and keep up the good work.

Regards,

Edson

On Wed, Feb 5, 2014 at 7:07 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> Hi Edson,
>
> I can see where the problem stems from now - it did puzzle me for a while.
> For this part to make sense, query and sbjct need to be the FULL sequence
> of the query and the subject (as given to BLAST as input):
>
>     complete_query_seq += str(query[q_start-1:q_end])
>     complete_sbjct_seq += str(sbjct[sb_start-1:sb_end])
>
> (I had assumed these variables were setup at the beginning of the file,
> which I partly why I asked for the full script.)
>
> However, via the for loop, you are using hsp.query, hsp.sbjct as query
> and sbjct, This are the PARTIAL sequences aligned with gap characters.
> This might do what you seemed to want:
>
>     complete_query_seq += query.replace("-", "")
>     complete_sbjct_seq += sbjct.replace("-", "")
>
> However, this will concatenate the fragments with an HSP - any bit of
> the query or subject which did not align will not be included. Any bit
> which appears in more than one HSP will be there twice. And also
> if you're using masking you'll have XXXXX X regions in the sequence
> where the filter said it was low complexity.
>
> I would instead get the original unmodified query/subject sequences
> from the original FASTA files given to BLAST.
>
> Peter
>
>
> On Tue, Feb 4, 2014 at 9:12 AM, Edson Ishengoma
> <ishengomae at nm-aist.ac.tz> wrote:
> > Hi Peter,
> >
> > My apology, I have updated the code at
> > https://gist.github.com/EBIshengoma/efc4ad3e32427891931d to appear
> exactly
> > how I run it from my computer.
> >
> > Thanks.
> >
>