[Biopython] Help modify this code so it can do what I want it to do

Ivan Gregoretti ivangreg at gmail.com
Mon Feb 3 13:43:17 UTC 2014


Hello Edson,

There is an argument that you can pass to tblastn that is called
max_hsps_per_subject. Try -max_hsps_per_subjec=1 and be sure not to
pass the flag -ungapped. That might do the job for you.

The help says

tblastn -help
...
 *** Statistical options
 -dbsize <Int8>
   Effective length of the database
 -searchsp <Int8, >=0>
   Effective length of the search space
 -max_hsps_per_subject <Integer, >=0>
   Override maximum number of HSPs per subject to save for ungapped searches
   (0 means do not override)
   Default = `0'
...

Ivan



Ivan Gregoretti, PhD


On Mon, Feb 3, 2014 at 7:19 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Sun, Feb 2, 2014 at 7:28 PM, Edson Ishengoma
> <ishengomae at nm-aist.ac.tz> wrote:
>> Hi folks,
>>
>> I picked this code from somewhere and edited it a bit but it still can't
>> achieve what I need. I have an xml output of tblastn hits on my customized
>> database and now I am in the process to extract the results with biopython.
>> With tblastn sometimes the returned hit is multiple local hits corresponding
>> to certain positions along the query with significant scores. Now I want to
>> concatenate these local hits which initially requires sorting according to
>> positions.
>>
>> ...
>>                       complete_query_seq += str(query[q_start:q_end])
>>                       complete_sbjct_seq += str(query[sb_start:sb_end])
>> ...
>
> Shouldn't you be taking a slice from the subject sequence (the database
> match) there, rather than the query sequence?
>
> Another approach would be to use the alignment sequence fragments
> BLAST gives you (and remove the gap characters).
>
> Peter
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython



More information about the Biopython mailing list