[Bioperl-l] Blastx parser misses scores

Jason Stajich jason at cgt.duhs.duke.edu
Thu Sep 4 21:49:06 EDT 2003


The Hit summary is where the score comes from, if there isn't a listing
for the hit you are interested in, we can't really report it - the parser
is just providing an object representation of what is in the input file.
You can get the HSP bit score, z score, evalue from these
fields $hsp->bits, $hsp->score, $hsp->evalue.

You can get more hit scores reported by changing your blast parameters so
it will report more hits in the sumamry (-v parameter) if you want to see
these summary value.  There were just recently some messages on this list
talking about how the summary scores are computed in case you wanted to
think about constructing them yourself.

-jason

On Fri, 5 Sep 2003, Holland, Richard wrote:

> > Are you talking about the case where you have 50 hits listed in the
> summary but say only 25 HSP alignments?
>
> Not sure. There are 10 hits listed in the summary and 18 detailed below
> it. We only get scores reported by the parser for the 10 in the summary.
>
> > Can you please provide and example report and code which doesn't
> behave as you would expect.
>
> The blast report in question is at the end of this email.
>
> Our code follows:
>
> ===========
>
>         my $blastin =
> Bio::SearchIO->new(-fh=>$fileRef,-format=>"blast");
>
>         while (1) {
>                 my $result = $blastin->next_result;
>                 if (not $result) { last; }
>
>                 my $QueryID = $result->query_name;
>                 my $QueryLength = $result->query_length;
>
>                 while(my $hit = $result->next_hit()) {
>                         my $hitid = $hit->name;
>                         my $score = $hit->raw_score;
>                         my $description = $hit->name . " " .
> $hit->description;
>                         while (my $hsp = $hit->next_hsp) {
>                                 my $expectation = $hsp->evalue;
>                                 my $frame = ($hsp->query->frame + 1) *
> $hsp->query->strand;
>                                 my $strand = $hsp->strand;
>                                 my $hitlength = $hit->length;
>                                 my $identities = $hsp->num_identical;
>                                 my $overlaps = $hsp->length('total');
>                                 my $gaps = $hsp->gaps;
>                                 my $qstart = $hsp->start('query');
>                                 my $qstop = $hsp->end('query');
>                                 my $hstart = $hsp->start('hit');
>                                 my $hstop = $hsp->end('hit');
>                                 my $positives = $hsp->num_conserved;
> 			# Truncated - code goes here that processes the
> results
>                         }
>                 }
>         }
>
> ===========
>
> The blast report looks like this. In the code above, all scores
> ($hit->raw_score) for hits ">SW:SSRP_DROME Q05344 drosophila
> melanogaster (fruit fly). single-strand recognition" onwards come out as
> null:
>
> ===========
>
> BLASTX 2.2.4 [Aug-26-2002]
>
>
> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.
> Schaffer,
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
> "Gapped BLAST and PSI-BLAST: a new generation of protein database search
> programs",  Nucleic Acids Res. 25:3389-3402.
>
> Query= 010404CS0701000001
>          (668 letters)
>
> Database: /home/seqstore/ncbi/blast/data/swplus
>            954,989 sequences; 303,757,025 total letters
>
> Searching..................................................done
>
>                                                                  Score
> E
> Sequences producing significant alignments:                      (bits)
> Value
>
> SP_PL:O04235 O04235 vicia faba (broad bean). transcription facto...
> 358   3e-98
> SW:SSTP_CATRO Q39601 catharanthus roseus (rosy periwinkle) (mada...
> 313   9e-85
> SW:SSRP_ARATH Q05153 arabidopsis thaliana (mouse-ear cress). str...
> 309   1e-83
> SP_PL:Q9LGR0 Q9lgr0 oryza sativa (rice). ests au069334(c60619). ...
> 306   1e-82
> SP_PL:Q8LKS8 Q8lks8 oryza sativa (indica cultivar-group). early ...
> 306   1e-82
> SP_PL:Q9LEF5 Q9lef5 zea mays (maize). ssrp1 protein. 10/2002
> 301   3e-81
> SP_OV:Q9W602 Q9w602 xenopus laevis (african clawed frog). duf87....
> 120   9e-27
> SP_RO:Q8CGA6 Q8cga6 mus musculus (mouse). similar to structure s...
> 115   5e-25
> SW:SSRP_HUMAN Q08945 homo sapiens (human). structure-specific re...
> 114   6e-25
> SW:SSRP_MOUSE Q08943 mus musculus (mouse). structure-specific re...
> 108   5e-23
>
> >SP_PL:O04235 O04235 vicia faba (broad bean). transcription factor.
> 10/2002
>           Length = 642
>
>  Score =  358 bits (919), Expect = 3e-98
>  Identities = 172/194 (88%), Positives = 184/194 (94%)
>  Frame = +3
>
> Query: 81  MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN
> 260
>            MTDGHLFNNITLG RGGTNPGQIKI+SGGILWKRQGGGK+I+VDK DI+ VTWMKVP++N
> Sbjct: 1   MTDGHLFNNITLGXRGGTNPGQIKIYSGGILWKRQGGGKTIDVDKTDIMGVTWMKVPKTN
> 60
>
> Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA
> 440
>            QLGVQIKDGL YKFTGFRDQDV+SLTNFFQNTFGI V+EKQLSV+GRNWG+VDLNGNMLA
> Sbjct: 61  QLGVQIKDGLLYKFTGFRDQDVVSLTNFFQNTFGITVEEKQLSVTGRNWGEVDLNGNMLA
> 120
>
> Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQFV
> 620
>            FMVGSKQAFEV LADVSQTNLQGKNDVILEFHVDDTTGANEKDSLME+SFHIP+SNTQFV
> Sbjct: 121 FMVGSKQAFEVSLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEMSFHIPSSNTQFV
> 180
>
> Query: 621 GDENTPPXQVFRXK 662
>            GDEN P  QVFR K
> Sbjct: 181 GDENRPSAQVFRDK 194
>
>
> >SW:SSTP_CATRO Q39601 catharanthus roseus (rosy periwinkle) (madagascar
> periwinkle).
>             structure-specific recognition protein 1 homolog (hmg
>             protein). 9/2003
>           Length = 639
>
>  Score =  313 bits (802), Expect = 9e-85
>  Identities = 153/194 (78%), Positives = 174/194 (88%)
>  Frame = +3
>
> Query: 81  MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN
> 260
>            M DGHLFNNITLGGRGGTNPGQ+++ SGGILWK+QGG K++EVDK+D+V +TWMKVPRSN
> Sbjct: 1   MADGHLFNNITLGGRGGTNPGQLRVHSGGILWKKQGGAKAVEVDKSDMVGLTWMKVPRSN
> 60
>
> Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA
> 440
>            QLGV+IKDGLFYKFTGFRDQDV SLT++ Q+T GI  +EKQLSVSG+NWG+VDLNGNML
> Sbjct: 61  QLGVRIKDGLFYKFTGFRDQDVASLTSYLQSTCGITPEEKQLSVSGKNWGEVDLNGNMLT
> 120
>
> Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEF/MWMTQLEPM\EKDSLMEISFHIPNSNTQ
> 614
>            F+VGSKQAFEV LADV+QT LQGKNDV+LEF MWM  LE M  K+SLMEISFH+PNSNTQ
> Sbjct: 121 FLVGSKQAFEVSLADVAQTQLQGKNDVMLEF MWMILLEQM RKNSLMEISFHVPNSNTQ
> 178
>
> Query: 615 FVGDENTPPXQVFRXK 662
>            FVGDEN PP QVFR K
> Sbjct: 179 FVGDENRPPAQVFRDK 194
>
>
> >SW:SSRP_ARATH Q05153 arabidopsis thaliana (mouse-ear cress).
> structure-specific
>             recognition protein 1 homolog (hmg protein). 9/2003
>           Length = 646
>
>  Score =  309 bits (792), Expect = 1e-83
>  Identities = 148/191 (77%), Positives = 167/191 (86%)
>  Frame = +3
>
> Query: 81  MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN
> 260
>            M DGH FNNI+L GRGG NPG +KI SGGI WK+QGGGK++EVD++DIVSV+W KV +SN
> Sbjct: 1   MADGHSFNNISLSGRGGKNPGLLKINSGGIQWKKQGGGKAVEVDRSDIVSVSWTKVTKSN
> 60
>
> Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA
> 440
>            QLGV+ KDGL+YKF GFRDQDV SL++FFQ+++G    EKQLSVSGRNWG+VDL+GN L
> Sbjct: 61  QLGVKTKDGLYYKFVGFRDQDVPSLSSFFQSSYGKTPDEKQLSVSGRNWGEVDLHGNTLT
> 120
>
> Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQFV
> 620
>            F+VGSKQAFEV LADVSQT LQGKNDV LEFHVDDT GANEKDSLMEISFHIPNSNTQFV
> Sbjct: 121 FLVGSKQAFEVSLADVSQTQLQGKNDVTLEFHVDDTAGANEKDSLMEISFHIPNSNTQFV
> 180
>
> Query: 621 GDENTPPXQVF 653
>            GDEN PP QVF
> Sbjct: 181 GDENRPPSQVF 191
>
>
> >SP_PL:Q9LGR0 Q9lgr0 oryza sativa (rice). ests au069334(c60619). 10/2002
>           Length = 641
>
>  Score =  306 bits (784), Expect = 1e-82
>  Identities = 141/190 (74%), Positives = 164/190 (86%)
>  Frame = +3
>
> Query: 81  MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN
> 260
>            MTDGHLFNNI LGGR G+NPGQ K++SGG+ WKRQGGGK+IE++K+D+ SVTWMKVPR+
> Sbjct: 1   MTDGHLFNNILLGGRAGSNPGQFKVYSGGLAWKRQGGGKTIEIEKSDLTSVTWMKVPRAY
> 60
>
> Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA
> 440
>            QLGV+ KDGLFYKF GFR+QDV SLTNF Q   G++  EKQLSVSG+NWG +D+NGNML
> Sbjct: 61  QLGVRTKDGLFYKFIGFREQDVSSLTNFMQKNMGLSPDEKQLSVSGQNWGGIDINGNMLT
> 120
>
> Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQFV
> 620
>            FMVGSKQAFEV LADVSQT +QGK DV+LEFHVDDTTG NEKDSLM++SFH+P SNTQF+
> Sbjct: 121 FMVGSKQAFEVSLADVSQTQMQGKTDVLLEFHVDDTTGGNEKDSLMDLSFHVPTSNTQFL
> 180
>
> Query: 621 GDENTPPXQV 650
>            GDEN    QV
> Sbjct: 181 GDENRTAAQV 190
>
>
> >SP_PL:Q8LKS8 Q8lks8 oryza sativa (indica cultivar-group). early drought
> induced
>             protein. 3/2003
>           Length = 641
>
>  Score =  306 bits (784), Expect = 1e-82
>  Identities = 141/190 (74%), Positives = 164/190 (86%)
>  Frame = +3
>
> Query: 81  MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN
> 260
>            MTDGHLFNNI LGGR G+NPGQ K++SGG+ WKRQGGGK+IE++K+D+ SVTWMKVPR+
> Sbjct: 1   MTDGHLFNNILLGGRAGSNPGQFKVYSGGLAWKRQGGGKTIEIEKSDLTSVTWMKVPRAY
> 60
>
> Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA
> 440
>            QLGV+ KDGLFYKF GFR+QDV SLTNF Q   G++  EKQLSVSG+NWG +D+NGNML
> Sbjct: 61  QLGVRTKDGLFYKFIGFREQDVSSLTNFMQKNMGLSPDEKQLSVSGQNWGGIDINGNMLT
> 120
>
> Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQFV
> 620
>            FMVGSKQAFEV LADVSQT +QGK DV+LEFHVDDTTG NEKDSLM++SFH+P SNTQF+
> Sbjct: 121 FMVGSKQAFEVSLADVSQTQMQGKTDVLLEFHVDDTTGGNEKDSLMDLSFHVPTSNTQFL
> 180
>
> Query: 621 GDENTPPXQV 650
>            GDEN    QV
> Sbjct: 181 GDENRTAAQV 190
>
>
> >SP_PL:Q9LEF5 Q9lef5 zea mays (maize). ssrp1 protein. 10/2002
>           Length = 639
>
>  Score =  301 bits (772), Expect = 3e-81
>  Identities = 138/190 (72%), Positives = 162/190 (84%)
>  Frame = +3
>
> Query: 81  MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN
> 260
>            MTDGH FNNI LGGRGGTNPGQ K+ SGG+ WKRQGGGK+IE+DKAD+ +VTWMKVPR+
> Sbjct: 1   MTDGHHFNNILLGGRGGTNPGQFKVHSGGLAWKRQGGGKTIEIDKADVTAVTWMKVPRAY
> 60
>
> Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA
> 440
>            QLGV+IK GLFY+F GFR+QDV +LTNF Q   G+   EKQLSVSG+NWG +D++GNML
> Sbjct: 61  QLGVRIKAGLFYRFIGFREQDVSNLTNFIQKNMGVTPDEKQLSVSGQNWGGIDIDGNMLT
> 120
>
> Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQFV
> 620
>            FMVGSKQAFEV L DV+QT +QGK DV+LE HVDDTTGANEKDSLM++SFH+P SNTQFV
> Sbjct: 121 FMVGSKQAFEVSLPDVAQTQMQGKTDVLLELHVDDTTGANEKDSLMDLSFHVPTSNTQFV
> 180
>
> Query: 621 GDENTPPXQV 650
>            GDE+ PP  +
> Sbjct: 181 GDESRPPAHI 190
>
>
> >SP_OV:Q9W602 Q9w602 xenopus laevis (african clawed frog). duf87.
> 10/2002
>           Length = 693
>
>  Score =  120 bits (302), Expect = 9e-27
>  Identities = 64/173 (36%), Positives = 100/173 (56%)
>  Frame = +3
>
> Query: 81  MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN
> 260
>            M D   FN+I    +G  N G++++   G+++K    GK   +  ADI  V W +V   +
> Sbjct: 1   MADTLEFNDIYQEVKGSMNDGRLRLSRAGLMYKNNKTGKVENISAADIAEVVWRRVALGH
> 60
>
> Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA
> 440
>             + +    G  YK+ GFR+ +   L ++F++ F + + EK L V G NWG V   G +L+
> Sbjct: 61  GIKLLTNGGHVYKYDGFRETEYDKLFDYFKSHFSVELVEKDLCVKGWNWGSVRFGGQLLS
> 120
>
> Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599
>            F +G + AFE+PL++VSQ    GKN+V LEFH +D    + + SLMEI F++P
> Sbjct: 121 FDIGDQPAFELPLSNVSQCT-TGKNEVTLEFHQND----DSEVSLMEIRFYVP 168
>
>
> >SP_RO:Q8CGA6 Q8cga6 mus musculus (mouse). similar to structure specific
>             recognition protein 1. 3/2003
>           Length = 711
>
>  Score =  115 bits (287), Expect = 5e-25
>  Identities = 59/167 (35%), Positives = 97/167 (57%)
>  Frame = +3
>
> Query: 99  FNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSNQLGVQI
> 278
>            FN+I    +G  N G++++   GI++K    GK   +   ++    W +V   + L +
> Sbjct: 7   FNDIFQEVKGSMNDGRLRLSRQGIIFKNSKTGKVDNIQAGELTEGIWRRVALGHGLKLLT
> 66
>
> Query: 279 KDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSK
> 458
>            K+G  YK+ GFR+ +   L++FF+  + + + EK L V G NWG V   G +L+F +G +
> Sbjct: 67  KNGHVYKYDGFRESEFEKLSDFFKTHYRLELMEKDLCVKGWNWGTVKFGGQLLSFDIGDQ
> 126
>
> Query: 459 QAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599
>              FE+PL++VSQ    GKN+V LEFH +D    + + SLME+ F++P
> Sbjct: 127 PVFEIPLSNVSQCT-TGKNEVTLEFHQND----DAEVSLMEVRFYVP 168
>
>
> >SW:SSRP_HUMAN Q08945 homo sapiens (human). structure-specific
> recognition protein 1
>             (ssrp1) (recombination signal sequence recognition
>             protein) (t160) (chromatin-specific transcription
>             elongation factor 80 kda subunit) (fact 80 kda subunit).
>             9/2003
>           Length = 709
>
>  Score =  114 bits (286), Expect = 6e-25
>  Identities = 58/167 (34%), Positives = 97/167 (57%)
>  Frame = +3
>
> Query: 99  FNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSNQLGVQI
> 278
>            FN++    +G  N G++++   GI++K    GK   +   ++    W +V   + L +
> Sbjct: 7   FNDVYQEVKGSMNDGRLRLSRQGIIFKNSKTGKVDNIQAGELTEGIWRRVALGHGLKLLT
> 66
>
> Query: 279 KDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSK
> 458
>            K+G  YK+ GFR+ +   L++FF+  + + + EK L V G NWG V   G +L+F +G +
> Sbjct: 67  KNGHVYKYDGFRESEFEKLSDFFKTHYRLELMEKDLCVKGWNWGTVKFGGQLLSFDIGDQ
> 126
>
> Query: 459 QAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599
>              FE+PL++VSQ    GKN+V LEFH +D    + + SLME+ F++P
> Sbjct: 127 PVFEIPLSNVSQCT-TGKNEVTLEFHQND----DAEVSLMEVRFYVP 168
>
>
> >SW:SSRP_MOUSE Q08943 mus musculus (mouse). structure-specific
> recognition protein 1
>             (ssrp1) (recombination signal sequence recognition
>             protein) (t160). 9/2003
>           Length = 708
>
>  Score =  108 bits (270), Expect = 5e-23
>  Identities = 56/167 (33%), Positives = 95/167 (56%)
>  Frame = +3
>
> Query: 99  FNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSNQLGVQI
> 278
>            FN+I    +G  N G++++   GI++K    GK   +   ++    W +V   + L +
> Sbjct: 7   FNDIFQEVKGSMNDGRLRLSPSGIIFKNSKTGKVDNIQAGELTEGIWPRVALGHGLKLLT
> 66
>
> Query: 279 KDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSK
> 458
>            K+G  YK+ GFR+ +   L++FF+  + + + EK L V G NWG V   G +L+F +G +
> Sbjct: 67  KNGHVYKYDGFRESEFEKLSDFFKTHYRLELMEKDLCVKGWNWGTVKFGGQLLSFDIGDQ
> 126
>
> Query: 459 QAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599
>              FE+PL++VS    Q + +V LEFH +D    + + SLME+ F++P
> Sbjct: 127 PVFEIPLSNVSSVP-QARIEVTLEFHQND----DPEVSLMEVRFYVP 168
>
>
> >SW:SSRP_DROME Q05344 drosophila melanogaster (fruit fly). single-strand
> recognition
>             protein (ssrp) (chorion-factor 5). 9/2003
>           Length = 723
>
>  Score =  101 bits (251), Expect = 7e-21
>  Identities = 63/173 (36%), Positives = 92/173 (52%)
>  Frame = +3
>
> Query: 81  MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN
> 260
>            MTD   +N+I    RG    G++K+    I++K    GK  ++   DI  +   K   +
> Sbjct: 1   MTDSLEYNDINAEVRGVLCSGRLKMTEQNIIFKNTKTGKVEQISAEDIDLINSQKFVGTW
> 60
>
> Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA
> 440
>             L V  K G+ ++FTGFRD +   L  F +  +   + EK++ V G NWG     G++L+
> Sbjct: 61  GLRVFTKGGVLHRFTGFRDSEHEKLGKFIKAAYSQEMVEKEMCVKGWNWGTARFMGSVLS
> 120
>
> Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599
>            F   SK  FEVPL+ VSQ  + GKN+V LEFH +D         L+E+ FHIP
> Sbjct: 121 FDKESKTIFEVPLSHVSQC-VTGKNEVTLEFHQNDDAPV----GLLEMRFHIP 168
>
>
> >SP_FUN:O94529 O94529 schizosaccharomyces pombe (fission yeast).
> putative structure
>             specific recognition protein. 3/2003
>           Length = 512
>
>  Score = 96.7 bits (239), Expect = 2e-19
>  Identities = 48/161 (29%), Positives = 86/161 (52%), Gaps = 2/161 (1%)
>  Frame = +3
>
> Query: 138 PGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSNQLGVQIKDGLFYKFTGFRD
> 317
>            PG+++I   G+ WK     +   +  ++I    W +  R  +L + +K        GF
> Sbjct: 19  PGKLRIAPSGLGWKSPSLAEPFTLPISEIRRFCWSRFARGYELKIILKSKDPVSLDGFSQ
> 78
>
> Query: 318 QDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSKQAFEVPLADVSQT
> 497
>            +D+  L N  +  F + +++K+ S+ G NWG+ +  G+ L F V S+ AFE+P++ V+ T
> Sbjct: 79  EDLDDLINVIKQNFDMGIEQKEFSIKGWNWGEANFLGSELVFDVNSRPAFEIPISAVTNT
> 138
>
> Query: 498 NLQGKNDVILEFHV--DDTTGANEKDSLMEISFHIPNSNTQ 614
>            NL GKN+V LEF    D    + + D L+E+  ++P +  +
> Sbjct: 139 NLSGKNEVALEFSTTDDKQIPSAQVDELVEMRLYVPGTTAK 179
>
>
> >SW:SSRP_CHICK Q04678 gallus gallus (chicken). structure-specific
> recognition
>             protein 1 (ssrp1) (recombination signal sequence
>             recognition protein) (t160) (fragment). 9/2003
>           Length = 669
>
>  Score = 95.9 bits (237), Expect = 3e-19
>  Identities = 48/131 (36%), Positives = 79/131 (59%)
>  Frame = +3
>
> Query: 207 VDKADIVSVTWMKVPRSNQLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQL
> 386
>            +  +++    W +V   + L +  K+G  YK+ GFR+ +   L++FF+  + + + EK L
> Sbjct: 5   IQASELAEGVWRRVALGHGLKLLTKNGHVYKYDGFRESEFDKLSDFFKAHYRLELAEKDL
> 64
>
> Query: 387 SVSGRNWGDVDLNGNMLAFMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEK
> 566
>             V G NWG V   G +L+F +G +  FE+PL++VSQ    GKN+V LEFH +D    + +
> Sbjct: 65  CVKGWNWGTVRFGGQLLSFDIGEQPVFEIPLSNVSQCT-TGKNEVTLEFHQND----DAE
> 119
>
> Query: 567 DSLMEISFHIP 599
>             SLME+ F++P
> Sbjct: 120 VSLMEVRFYVP 130
>
>
> >SP_IN:Q8IL56 Q8il56 plasmodium falciparum (isolate 3d7). structure
> specific
>             recognition protein, putative. 3/2003
>           Length = 506
>
>  Score = 94.0 bits (232), Expect = 1e-18
>  Identities = 50/170 (29%), Positives = 89/170 (51%), Gaps = 5/170 (2%)
>  Frame = +3
>
> Query: 120 GRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN-----QLGVQIKD
> 284
>            G GG++ G  ++ +  + WK +      +   +DI    W+K   +N     +LG + K+
> Sbjct: 21  GFGGSDFGSFRMSNEFLGWKNKKTNNVYQYKCSDIDEGCWIKTSYNNNRLHLKLG-ESKE
> 79
>
> Query: 285 GLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSKQA
> 464
>             +   F GF D++V  +T  FQ  F I +  ++++  G NWG+  L  + L F + +K A
> Sbjct: 80  NIIIYFDGFPDRNVNEITQHFQKYFNIRLNNRKIATKGWNWGEFKLENSNLCFDIDNKYA
> 139
>
> Query: 465 FEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQ 614
>            F +P  +++Q N+Q K D+ +EF  D+      +D L EI F+ P+ N +
> Sbjct: 140 FNLPTNNINQLNVQIKTDIAMEFKNDENNNKGNEDFLAEIRFYYPHENDE 189
>
>
> >SW:SSRP_CAEEL P41848 caenorhabditis elegans. probable
> structure-specific
>             recognition protein 1 (ssrp1) (recombination signal
>             sequence recognition protein). 9/2003
>           Length = 697
>
>  Score = 92.0 bits (227), Expect = 4e-18
>  Identities = 48/153 (31%), Positives = 82/153 (53%)
>  Frame = +3
>
> Query: 141 GQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSNQLGVQIKDGLFYKFTGFRDQ
> 320
>            G +K+    + +K   GGKS+ V  +DI  + W K+     L V + DG  ++F GF+D
> Sbjct: 20  GTLKLTEKSLNFKGDKGGKSVNVTGSDIDKLKWQKLGNKPGLRVGLNDGGAHRFGGFKDT
> 79
>
> Query: 321 DVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSKQAFEVPLADVSQTN
> 500
>            D+  + +F  + +  ++ +  L + G N+G  ++ G  + F    K  FE+P  +VS
> Sbjct: 80  DLEKIQSFTSSNWSQSIDQSNLFIKGWNYGQAEVKGKTVEFSWEDKPIFEIPCTNVSNV-
> 138
>
> Query: 501 LQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599
>            +  KN+ +LEFH +D    + K  LME+ FH+P
> Sbjct: 139 IANKNEAVLEFHQND----DSKVQLMEMRFHMP 167
>
>
> >SW:YMG9_YEAST Q04636 saccharomyces cerevisiae (baker's yeast).
> hypothetical 63.0
>             kda protein in dak1-orc1 intergenic region. 5/2000
>           Length = 552
>
>  Score = 89.0 bits (219), Expect = 4e-17
>  Identities = 50/161 (31%), Positives = 80/161 (49%), Gaps = 8/161 (4%)
>  Frame = +3
>
> Query: 141 GQIKIFSGGILWK--RQGGGKSIEVDK------ADIVSVTWMKVPRSNQLGVQIKDGLFY
> 296
>            G+ +I   G+ WK    GG  + +  K       ++ +V W +  R   L +  K+
> Sbjct: 17  GRFRIADSGLGWKISTSGGSAANQARKPFLLPATELSTVQWSRGCRGYDLKINTKNQGVI
> 76
>
> Query: 297 KFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSKQAFEVP
> 476
>            +  GF   D   + N F   F I V++++ S+ G NWG  DL  N + F +  K  FE+P
> Sbjct: 77  QLDGFSQDDYNLIKNDFHRRFNIQVEQREHSLRGWNWGKTDLARNEMVFALNGKPTFEIP
> 136
>
> Query: 477 LADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599
>             A ++ TNL  KN+V +EF++ D       D L+E+ F+IP
> Sbjct: 137 YARINNTNLTSKNEVGIEFNIQDEEYQPAGDELVEMRFYIP 177
>
>
> >SP_IN:O01683 O01683 caenorhabditis elegans. c32f10.5 protein. 3/2003
>           Length = 689
>
>  Score = 86.7 bits (213), Expect = 2e-16
>  Identities = 50/186 (26%), Positives = 90/186 (47%)
>  Frame = +3
>
> Query: 99  FNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSNQLGVQI
> 278
>            F  + +   G    G + +    I +    GGKS+ +   D+  + W K+     L V +
> Sbjct: 6   FKGVYVEDIGHLTCGTLTLTENSINFIGDKGGKSVYITGTDVDKLKWQKLGNKPGLRVGL
> 65
>
> Query: 279 KDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSK
> 458
>             DG  ++F GF D D+  + +F  + +  ++ +  L ++G N+G  D+ G  + F   ++
> Sbjct: 66  SDGGAHRFGGFLDDDLQKIQSFTSSNWSKSINQSNLFINGWNYGQADVKGKNIEFSWENE
> 125
>
> Query: 459 QAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQFVGDENTP
> 638
>              FE+P  +VS   +  KN+ ILEFH ++      K  LME+ FH+P        +E+T
> Sbjct: 126 PIFEIPCTNVSNV-IANKNEAILEFHQNE----QSKVQLMEMRFHMP---VDLENEEDTD
> 177
>
> Query: 639 PXQVFR 656
>              + F+
> Sbjct: 178 KVEEFK 183
>
>
> >SP_FUN:Q9HFC4 Q9hfc4 zygosaccharomyces rouxii (candida mogii).
> ssrp1-like protein
>             (fragment). 10/2002
>           Length = 542
>
>  Score = 85.1 bits (209), Expect = 5e-16
>  Identities = 48/165 (29%), Positives = 79/165 (47%), Gaps = 8/165 (4%)
>  Frame = +3
>
> Query: 141 GQIKIFSGGILWKRQGGGKSIE--------VDKADIVSVTWMKVPRSNQLGVQIKDGLFY
> 296
>            G+ +I   G+ WK    G S          +   ++ +V W +  R  +L V  K+
> Sbjct: 45  GRFRIADSGLGWKSANAGGSAANQSKQPFLLPATELSTVQWSRGCRGFELKVNTKNQGVV
> 104
>
> Query: 297 KFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSKQAFEVP
> 476
>            +  GF   D   + N F   F + V+ K+ S+ G NWG  DL  N + F +  + +FEVP
> Sbjct: 105 QLDGFAPDDFNLIKNDFHRRFNVQVEPKEHSLRGWNWGKADLARNEMVFALNGRPSFEVP
> 164
>
> Query: 477 LADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNT 611
>             A ++ TNL  K +V +EF++ D       D L+E+  ++P + T
> Sbjct: 165 YARINNTNLTSKTEVAIEFNLADENYQPAGDELVEMRLYVPGTVT 209
>
>
>   Database: /home/seqstore/ncbi/blast/data/swplus
>     Posted date:  Apr 15, 2003 12:04 PM
>   Number of letters in database: 303,757,025
>   Number of sequences in database:  954,989
>
> Lambda     K      H
>    0.318    0.135    0.401
>
> Gapped
> Lambda     K      H
>    0.267   0.0410    0.140
>
>
> Matrix: BLOSUM62
> Gap Penalties: Existence: 11, Extension: 1
> Number of Hits to DB: 385,793,622
> Number of Sequences: 954989
> Number of extensions: 8541745
> Number of successful extensions: 21678
> Number of sequences better than 1.0e-06: 36
> Number of HSP's better than  0.0 without gapping: 21171
> Number of HSP's successfully gapped in prelim test: 0
> Number of HSP's that attempted gapping in prelim test: 0
> Number of HSP's gapped (non-prelim): 21664
> length of database: 303,757,025
> effective HSP length: 116
> effective length of database: 192,978,301
> effective search space used: 20455699906
> frameshift window, decay const: 50,  0.1
> T: 12
> A: 40
> X1: 16 ( 7.3 bits)
> X2: 38 (14.6 bits)
> X3: 64 (24.7 bits)
> S1: 41 (21.7 bits)
>
> ===========
>
> Richard Holland
> Bioinformatics Database Developer
> ITS, Agresearch Invermay x3279
>
>
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason at cgt.duhs.duke.edu]
> Sent: Friday, 5 September 2003 9:39 a.m.
> To: Holland, Richard
> Cc: bioperl-l at bioperl.org; McCulloch, Alan
> Subject: Re: [Bioperl-l] Blastx parser misses scores
>
>
> Can you please provide and example report and code which doesn't behave
> as you would expect.
>
> Are you talking about the case where you have 50 hits listed in the
> summary but say only 25 HSP alignments?
>
>
> On Fri, 5 Sep 2003, Holland, Richard wrote:
>
> > Hi,
> >
> > I have run into a problem with Bio::SearchIO::blast parsing blastx
> > result files. This may affect other blast outputs as well but I'm not
> > sure.
> >
> > At the top of a blastx output there is a summary of the best hits in
> > the results file. Then, all the hits are listed, even the ones which
> > are not in the best hits list.
> >
> > The Bio::Perl parser successfully parses all the hits from the file,
> > however it only returns scores for those which appear in the summary.
> > I have found the code which does this in Bio::SearchIO::blast and
> > noticed that this seems to be deliberate - in all cases, blastx or
> > not, the scores are taken from the summary, and the scores in the hit
> > details appear to be ignored.
> >
> > Is this a feature or a bug? We would like to be able to use Bio::Perl
> > to parse out all the results from our blast reports including all
> > their scores and details, regardless of whether or not they appear in
> > the best hits summary.
> >
> > Can anyone help?
> >
> > cheers,
> > Richard
> > ======================================================================
> > =
> > Attention: The information contained in this message and/or
> attachments
> > from AgResearch Limited is intended only for the persons or entities
> > to which it is addressed and may contain confidential and/or
> privileged
> > material. Any review, retransmission, dissemination or other use of,
> or
> > taking of any action in reliance upon, this information by persons or
> > entities other than the intended recipients is prohibited by
> AgResearch
> > Limited. If you have received this message in error, please notify the
> > sender immediately.
> >
> =======================================================================
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
>
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list