[Biopython] Help modify this code so it can do what I want it to do

Mon Feb 3 19:16:55 UTC 2014

Hi Peter,

Sorry that was the typo, it should be:
complete_sbjct_seq += str(sbjct[sb_start:sb_end]).

I tried a suggestion by Ivan on the providing tblastn option
[-max_hsps_per_subject 1] but still the output shows up as fragmented hits.

Peter said: "Another approach would be to use the alignment sequence
fragments BLAST gives you (and remove the gap characters)."
With the script I have I can only extract the first fragment only for each
hit. I don't know why string slicing method [sb_start:sb_end] in my script
does not include start and end positions for subsequent fragments.

Regards,

Edson

On Mon, Feb 3, 2014 at 4:43 PM, Ivan Gregoretti <ivangreg at gmail.com> wrote:

> Hello Edson,
>
> There is an argument that you can pass to tblastn that is called
> max_hsps_per_subject. Try -max_hsps_per_subjec=1 and be sure not to
> pass the flag -ungapped. That might do the job for you.
>
> The help says
>
> tblastn -help
> ...
>  *** Statistical options
>  -dbsize <Int8>
>    Effective length of the database
>  -searchsp <Int8, >=0>
>    Effective length of the search space
>  -max_hsps_per_subject <Integer, >=0>
>    Override maximum number of HSPs per subject to save for ungapped
> searches
>    (0 means do not override)
>    Default = `0'
> ...
>
> Ivan
>
>
>
> Ivan Gregoretti, PhD
>
>
> On Mon, Feb 3, 2014 at 7:19 AM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> > On Sun, Feb 2, 2014 at 7:28 PM, Edson Ishengoma
> > <ishengomae at nm-aist.ac.tz> wrote:
> >> Hi folks,
> >>
> >> I picked this code from somewhere and edited it a bit but it still can't
> >> achieve what I need. I have an xml output of tblastn hits on my
> customized
> >> database and now I am in the process to extract the results with
> biopython.
> >> With tblastn sometimes the returned hit is multiple local hits
> corresponding
> >> to certain positions along the query with significant scores. Now I
> want to
> >> concatenate these local hits which initially requires sorting according
> to
> >> positions.
> >>
> >> ...
> >>                       complete_query_seq += str(query[q_start:q_end])
> >>                       complete_sbjct_seq += str(query[sb_start:sb_end])
> >> ...
> >
> > Shouldn't you be taking a slice from the subject sequence (the database
> > match) there, rather than the query sequence?
> >
> > Another approach would be to use the alignment sequence fragments
> > BLAST gives you (and remove the gap characters).
> >
> > Peter
> > _______________________________________________
> > Biopython mailing list  -  Biopython at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython
>