[Bioperl-l] bl2seq parse nightmare

Jason Stajich jason at cgt.duhs.duke.edu
Thu Jun 5 00:26:38 EDT 2003


(Bio::AlignIO::bl2seq just uses Bio::Tools::BPbl2seq so problems would be
identical)

I have no problem getting sequence with gaps with code that looks like
this.  I'm testing this on a protein sequence, but it should make no
difference.  I tested on a bl2seq rpt based on files from yr last msg, but
the query sequence has no gaps (if you make seq1 the query) and the
subject does have gaps and they show up so not sure what the problem is.

#!/usr/bin/perl -w
use strict;
use Bio::Tools::BPbl2seq;

my ($rt,$file) = @ARGV;
my $report = new Bio::Tools::BPbl2seq(-report_type => $rt,
				      -file        => $file);

while(my $hsp = $report->next_feature) {
    print "qstrand : ",$hsp->query->strand, "\n";
    print "hstrand : ",$hsp->sbjct->strand, "\n";

    print $hsp->querySeq, "\n";
    print $hsp->sbjctSeq, "\n";
}

[jason at sonogno test]$ perl -I ~/bioperl/branch-1.2 BPbl2seq_parse.pl blastp t/data/bl2seq.out
qstrand : 0
hstrand : 0
QFLEFQDKFNKKY-SHEEYLERFEIFKSNLGKIEELNLIAINHKADTKFGVNKFADLSSDEFKNYYLNNKEAIFTDDLPVADYLDDEFINSIPTAFDWRTRGAVTPVKNQGQCGSCWSFSTTGNVEGQHFISQNKLVSLSEQNLVDCDHECMEYEGEEACDEGCNGGLQPNAYNYIIKNGGIQTESSYPYTAETGTQCNFNSANIGAKISNFTMIPKN-ETVMAGYIVSTGPLAIAADAVE-WQFYIGGVF---DIPCNPNSLDHGILIVGYSAKNTIFRKNMPYWIVKNSWGADWGEQGYIYLRRGKNTCGVSNFVSTSII
RFARFAVRYGKSYESAAEVRRRFRIFSESLEEVRSTNRKGLPYR----LGINRFSDMSWEEFQATRLG---AAQTCSATLAGNHLMRDAAALPETKDWREDGIVSPVKNQAHCGSCWTFSTTGALEAAYTQATGKNISLSEQQLVDCAGGFNNF--------GCNGGLPSQAFEYIKYNGGIDTEESYPYKGVNGV-CHYKAENAAVQVLDSVNITLNAEDELKNAVGLVRPVSVAFQVIDGFRQYKSGVYTSDHCGTTPDDVNHAVLAVGYGVEN-----GVPYWLIKNSWGADWGDNGYFKMEMGKNMCAIATCASYPVV


You may want to really think about using the latest code from CVS -
Bio::Search::HSP::HSPI objects have a method called seq_inds which
computes where the gaps, mismatches, conserved, and identical positions
are for you already.

I added bl2seq parsing to the blast parser so you can use it directly
from this.

http://cvs.open-bio.org tells you how to do a checkout of the code.


Good luck.
-jason
On Thu, 5 Jun 2003, Julio Fernandez Banet wrote:

> Hello.
> I'm parsing my bl2seq seq report.
> I tried with BPbl2seq and with alignIO but my problem is exactly the same in
> both cases.
> When I call both sequences in the alignment, I get the query sequence with
> gaps on it (ie: agtc-aactggtca) but when it comes to the subject the gaps
> disapear (ie: tcagttgaccagt).
> This is a big problem for me as I would like to get the values in the
> position where the cigar line says is a mismatch and when a gap is in the
> subject is gives me the wrong values.
> Please I'm completely desperate as it's the last bug I need to fix to finish
> my program so any help or advice will be really wellcome.
> Thanks a lot.
> Julio
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list