[Bioperl-l] Add a kind of hspsepQmax/hspsepSmax (like WuBlast has)in Bio::Search::Tiling::MapTiling

Frederic.SAPET at biogemma.com Frederic.SAPET at biogemma.com
Thu Apr 29 15:54:29 UTC 2010


Hi Mark

The kludge works.

I have just had to set undef the previously calculalted value of 
identities.

my $identString = "identities_".$type."_exact_".$context;
$tiling->{$identString} = undef;

But I think that what I would like to do is more deeper than that.

Sometime, I can have such results :
Chr1    TBLASTN match_set       2164104 56772932        544     +       .  
 
ID=Sample2;alignLength=1105;eValue=0.0;fractionAligned=95.959595959596;gapNumber=67;Name=Sample2;percentageIdentity=60.0476992143659
Chr1    TBLASTN match_part      2164104 2174973 630     +       1 
Parent=Sample2;Target=Sample2 71 358
Chr1    TBLASTN match_part      2216917 2218191 1014    +       1 
Parent=Sample2;Target=Sample2 70 502
Chr1    TBLASTN match_part      2218504 2218665 181     +       1 
Parent=Sample2;Target=Sample2 533 585
Chr1    TBLASTN match_part      56771229        56771357        230     +  
    1       Parent=Sample2;Target=Sample2 25 67
Chr1    TBLASTN match_part      56772054        56772932        1401    +  
    1       Parent=Sample2;Target=Sample2 71 364

I would like to see the HSP separated in two distinct groups.
I tried to have a look inside the source code.
Is the method interval_tiling in MapTileUtils.pm a good start ?
Can I add here a new param (the kind of hspsepQmax/hspsepSmax) ?

thank you.

Fred


"Mark A. Jensen" <maj at fortinbras.us> a écrit sur 26/04/2010 15:17:51 :

> Hi Fred,
> 
> I'll tell you how you can write a kludge; maybe you can expand it into
> a more general method.
> 
> For your tblastn data, get the coverage map array
> 
>  @map = $tiling->coverage_map('hit', 'p0')
> 
> Each element of the map is a ref to a pair [$int, $hsp], where $int is
> itself a reference to a two-elt array containing the coordinates of the
> hsp in context and $hsp is the hsp object itself. You can use these to
> filter the @map array.
> 
> For your example, you can just get rid of the first @map elt:
> 
>  shift @map;
> 
> Replace the internal map for this type and context, so that
> the methods work on the modified map:
> 
>  $tiling->{'coverage_map_hit_p0'} = \@map;
> 
> Then $tiling->identities('hit', 'exact', 'p0'), etc. give you the
> new values.
> 
> HTH-
> MAJ
> ----- Original Message ----- 
> From: <Frederic.SAPET at biogemma.com>
> To: <bioperl-l at bioperl.org>
> Sent: Friday, April 23, 2010 11:16 AM
> Subject: [Bioperl-l] Add a kind of hspsepQmax/hspsepSmax (like WuBlast 
has)in 
> Bio::Search::Tiling::MapTiling
> 
> 
> > Hello
> >
> > Based on bp_search2gff.pl script and Bio::Search::Tiling::MapTiling
> > documentation (http://www.bioperl.org/wiki/HOWTO:Tiling), I'm trying 
to
> > write a generic blast to gff3 parser.
> >
> > My idea is to filter hits on frac_aligned and percent_identity values.
> >
> > I'm facing a problem with a BlastX result and the corresponding 
TBlastN.
> >
> > Please find my script and the two example files attached.
> >
> > The example is a piece of Maize Chromosome where a protein seems to be
> > duplicated.
> >
> > When I launch the parsing of BlastX file and I want to retrieve data 
from
> > a Query View ( >tiling.pl BlastX query), I have :
> >
> > Chr6:159690000-159718000        BLASTX  match_set       23971   25620
> > 121.6   +       .
> > ID=Os03g17980.2:1.1.1;alignLength=576;eValue=4.6e-137;
> fractionAligned=97.0530451866405;gapNumber=16;Name=Os03g17980.2;
> percentageIdentity=69.1552062868369
> > Chr6:159690000-159718000        BLASTX  match_part      23971   24186 
331
> >  +       0       Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 120 191
> > Chr6:159690000-159718000        BLASTX  match_part      24820   24915 
100
> >  +       0       Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 291 322
> > Chr6:159690000-159718000        BLASTX  match_part      25195   25308  
89
> >     +       0       Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 358 
395
> > Chr6:159690000-159718000        BLASTX  match_part      25390   25620 
192
> >  +       0       Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 395 472
> >
> > Chr6:159690000-159718000        BLASTX  match_set       918     2567 
121.6
> >  +       .
> > ID=Os03g17980.2:1.2.1;alignLength=576;eValue=4.6e-137;
> fractionAligned=97.0530451866405;gapNumber=16;Name=Os03g17980.2;
> percentageIdentity=69.1552062868369
> > Chr6:159690000-159718000        BLASTX  match_part      918     1148 
192
> > -       0       Parent=Os03g17980.2:1.2.1;Target=Os03g17980.2 395 472
> > Chr6:159690000-159718000        BLASTX  match_part      1230    1343  
89
> >     -       0       Parent=Os03g17980.2:1.2.1;Target=Os03g17980.2 358 
395
> > Chr6:159690000-159718000        BLASTX  match_part      1623    1718 
100
> > -       0       Parent=Os03g17980.2:1.2.1;Target=Os03g17980.2 291 322
> > Chr6:159690000-159718000        BLASTX  match_part      2352    2567 
331
> > -       0       Parent=Os03g17980.2:1.2.1;Target=Os03g17980.2 120 191
> >
> > this is perfect, I retrieve two nice hits, with perfectly tiled HSP.
> >
> > But, with the TBlastN report (using a Hit View :  >tiling.pl TBlastN 
hit),
> > I have :
> > Chr6:159690000-159718000        TBLASTN match_set       7666    25620
> > 121.6   +       .
> > ID=Os03g17980.2:1.1.1;alignLength=303;eValue=4.9e-137;
> fractionAligned=98.8212180746562;gapNumber=18;Name=Os03g17980.2;
> percentageIdentity=66.0052390307793
> > Chr6:159690000-159718000        TBLASTN match_part      7666    7917  
44
> >     +       0       Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 332 
416
> > Chr6:159690000-159718000        TBLASTN match_part      23971   24186 
331
> >  +       0       Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 120 191
> > Chr6:159690000-159718000        TBLASTN match_part      24820   24915 
100
> >  +       0       Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 291 322
> > Chr6:159690000-159718000        TBLASTN match_part      25195   25308  
89
> >     +       0       Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 358 
395
> > Chr6:159690000-159718000        TBLASTN match_part      25390   25620 
192
> >  +       0       Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 395 472
> >
> > I lose one of my hit, because another HSP is tiled to my hit, so I 
trash
> > it when I filter the context using identitie values (line 42 to 54 of 
my
> > script).
> > This HSP is far away in 5', so I would like to know if it could be
> > possible to add (or help me to develop this) a sort of
> > hspsepQmax/hspsepSmax (maximum allowed separation along the query(or
> > subject) sequence between two HSPs ) as a new parameter during the 
tiling
> > phase ?
> >
> >
> >
> > Thank you.
> >
> > Fred
> >
> >
> >
> 
> 
> 
--------------------------------------------------------------------------------
> 
> 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l 
> 




More information about the Bioperl-l mailing list