[Biopython] Help modify this code so it can do what I want it to do
Edson Ishengoma
ishengomae at nm-aist.ac.tz
Mon Feb 3 19:16:55 UTC 2014
Hi Peter,
Sorry that was the typo, it should be:
complete_sbjct_seq += str(sbjct[sb_start:sb_end]).
I tried a suggestion by Ivan on the providing tblastn option
[-max_hsps_per_subject 1] but still the output shows up as fragmented hits.
Peter said: "Another approach would be to use the alignment sequence
fragments BLAST gives you (and remove the gap characters)."
With the script I have I can only extract the first fragment only for each
hit. I don't know why string slicing method [sb_start:sb_end] in my script
does not include start and end positions for subsequent fragments.
Regards,
Edson
On Mon, Feb 3, 2014 at 4:43 PM, Ivan Gregoretti <ivangreg at gmail.com> wrote:
> Hello Edson,
>
> There is an argument that you can pass to tblastn that is called
> max_hsps_per_subject. Try -max_hsps_per_subjec=1 and be sure not to
> pass the flag -ungapped. That might do the job for you.
>
> The help says
>
> tblastn -help
> ...
> *** Statistical options
> -dbsize <Int8>
> Effective length of the database
> -searchsp <Int8, >=0>
> Effective length of the search space
> -max_hsps_per_subject <Integer, >=0>
> Override maximum number of HSPs per subject to save for ungapped
> searches
> (0 means do not override)
> Default = `0'
> ...
>
> Ivan
>
>
>
> Ivan Gregoretti, PhD
>
>
> On Mon, Feb 3, 2014 at 7:19 AM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> > On Sun, Feb 2, 2014 at 7:28 PM, Edson Ishengoma
> > <ishengomae at nm-aist.ac.tz> wrote:
> >> Hi folks,
> >>
> >> I picked this code from somewhere and edited it a bit but it still can't
> >> achieve what I need. I have an xml output of tblastn hits on my
> customized
> >> database and now I am in the process to extract the results with
> biopython.
> >> With tblastn sometimes the returned hit is multiple local hits
> corresponding
> >> to certain positions along the query with significant scores. Now I
> want to
> >> concatenate these local hits which initially requires sorting according
> to
> >> positions.
> >>
> >> ...
> >> complete_query_seq += str(query[q_start:q_end])
> >> complete_sbjct_seq += str(query[sb_start:sb_end])
> >> ...
> >
> > Shouldn't you be taking a slice from the subject sequence (the database
> > match) there, rather than the query sequence?
> >
> > Another approach would be to use the alignment sequence fragments
> > BLAST gives you (and remove the gap characters).
> >
> > Peter
> > _______________________________________________
> > Biopython mailing list - Biopython at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython
>
More information about the Biopython
mailing list