[Bioperl-l] Question about the definition of 'gaps' in blast -m8 output...
Phillip San Miguel
pmiguel at purdue.edu
Tue Mar 24 07:53:10 EDT 2009
Dan Bolser wrote:
> 2009/3/19 Phillip San Miguel <pmiguel at purdue.edu>:
>
>> Dan Bolser wrote:
>>
>>> 2009/3/18 Phillip San Miguel <pmiguel at purdue.edu>
>>>
>>>
>>>
>>>> Dan Bolser wrote:
>>>>
>>>>
>>>>
>>>>> Can someone clarify the definition of the 'gaps' column in the blast -m8
>>>>> output format for me?
>>>>>
>>>>> I thought that the column 'gaps' was basically the number of columns in
>>>>> the
>>>>> HSP that contains a gap character.
>>>>>
>>>>>
>>>>>
>>>> Hi Dan,
>>>> "gaps", to me, denotes the number of gaps. Not the total length of all
>>>> the
>>>> gaps.
>>>> Just my interpretation, but given your results my guess is that whomever
>>>> wrote blastall was thinking the way I do.
>>>>
>>>>
>>> Yeah, I'll have to go look at the HSPs to confirm this... I'm just
>>> surprised
>>> that there are not more gaps of length >1. i.e. my data (given your
>>> interpretation) suggests that 90% of the HSPs have no gaps > length 1.
>>>
>>>
>> Sounds about right. Depends on how you have gap opening vs gap lengthening
>> parameters set.
>>
>
> I see. I thought that by default extension was less than opening, so I
> had expected there to be more gaps of length >1 ... anyway... where
> can I read more about selecting parameters for certain tasks?
> Currently I'm blasting tomato against potato sequence, and the two
> organisms are known to be 'highly syntenic' - I'm just not sure how
> that translates into how I should set the parameters. I'm after large
> alignments of large regions of the chromosome. My thinking is to just
> run through the list of HSPs and merge based on gap / window size
> (dynamic programming style) - that way I can play with the set of HSPs
> that I have, and look at the effect of different settings, then I can
> just globally align the matching regions using SW (if I need to). Does
> that sound reasonable, or is using the default settings just dumb?
>
> Cheers,
> Dan.
>
>
Hi Dan,
Sorry, I didn't mean to suggest that the only reason you were seeing
a preponderance of single base indels was due to your settings. I do
expect single base indels to outnumber longer indels.
Nevertheless, I always thought standard alignment tools should not
use a linear gap extension penalty. That past some point, extending a
gap should be further "discounted". Maybe the gap extension penalty
should be the log of the number of bases extended?
BTW, I just noticed that the blastall '-m 9' parameter, includes the
column headers. They are:
# Fields: Query id, Subject id, % identity, alignment length,
mismatches, gap openings, q. start, q. end, s. start, s. end, e-value,
bit score
So, the column in question is "gap openings".
Phillip
More information about the Bioperl-l
mailing list