[Bioperl-l] RE: Local bl2seq

Barry Moore bmoore at genetics.utah.edu
Tue Aug 23 14:40:51 EDT 2005


Usha-

I think the code below will wrap your existing code in the loop you
need.  You will want to get a copy of a good perl programming book like
Programming Perl from O'Reilly.  It will help you out with all those
little perl details like loop structures etc.

Barry

#!/usr/bin/perl

use strict;
use warnings;
use Bio::SeqIO;
use Bio::Tools::Run::StandAloneBlast;
use Bio::Seq;

my $seqio_obj = Bio::SeqIO->new(-file => " sequences.fasta",
                                -format => "fasta" );

my $seq_obj = $seqio_obj->next_seq;

open (IN, " location/of/your/probe/file") or die "Can't open IN";

while (my $row = <IN>) {
    chomp $row;
    #Assuming your file is tab delimited...
    my ($seq_id, $probe_id, $position, $probe_sequence) = split /\t/,
$row;

    my $input2 = Bio::Seq->new(-id=>"testquery2",
                               -seq=> $probe_sequence
                            );

    my $factory = Bio::Tools::Run::StandAloneBlast->new('program' =>
'blastn',
                                                        'outfile' =>
'bl2seq1.out');

    my $blast_report = $factory->bl2seq($seq_obj, $input2);

    #Here is where you want to throw out good matches.  You'll need to
determine
    #what method you want to do that.  Maybe since you want there to be
no good
    #hits you would just call $blast_report->max_significance and make
sure it's
    #value is too high to be significant.
    if ($blast_report->max_significance > 0.01) {
        print "$row\n";
    }
}

-----Original Message-----
From: Usha Rani Reddi [mailto:ureddi at emich.edu] 
Sent: Tuesday, August 23, 2005 5:51 AM
To: Barry Moore
Cc: bioperl-l at portal.open-bio.org
Subject: Local bl2seq

Hi,
I am trying to use BLAST to compare the sequences. I did the program in
Bioperl. Below is my piece of code
use Bio::SeqIO;
use Bio::Tools::Run::StandAloneBlast;
use Bio::Seq;
$seqio_obj = Bio::SeqIO->new(-file => "sequences.fasta",
                             -format => "fasta" );
$seq_obj = $seqio_obj->next_seq;
$input2 = Bio::Seq->new(-id=>"testquery2",
                                 
-seq=>"ggacccgatgactagccccttgatcgtagcagtggcaagtca");
          
$factory = Bio::Tools::Run::StandAloneBlast->new('program' =>
'blastn','outfile' => 'bl2seq1.out');
$blast_report = $factory->bl2seq($seq_obj, $input2);

I need help for looping input2. I want to extract this part of sequence
from a file containing 200000 records. Using perl I am extracting the
sequence part for file of format given below.
SEQ_ID	PROBE_ID	POSITION	PROBE_SEQUENCE
NC_004116	1	1	AATTAACATTGTTGATTTTATTCTTCAACATC
NC_004116	3	13	TGATTTTATTCTTCAACATCTGTGGAAAACTT
NC_004116	5	25	TCAACATCTGTGGAAAACTTTATTTTTTTATG

code for extracting PROBE_SEQUENCE looks like this

$NemSeq =<STDIN>;

chomp $NemSeq;

unless (open(seqfile, $NemSeq)) {
print "Cannot open file \n";
exit;
}
@NemSeq = <seqfile> ;

close seqfile;

for (my $k = 0 ; $k < scalar @NemSeq ; ++$k) {
    #print $k, $NemSeq[$k];
    @Nem =split(/\t/,$NemSeq[$k]);
    $input= $Nem[3];

    #print scalar(@Nem);
    #print $Nem[3], "\n";
    
}


@Nem =split(/\t/,$NemSeq)

$input2 = substr(@NemSeq,4,32);

So far I could successfully use bioperl(bl2seq) to compare whole genome
with single probe. 
I want to compare all the 200000 thousand probes. I am interested only
in mismatches, for this particular scenario my assumption is that more
than 90% of them will match. I want to send only the mismatches to
output file and discard the matches. I would like to classify the
mismatches based on the percentage dissimilarity, is there a way in
Bioperl for this? Thanks a lot for the reply. Please help me with this.
Thanks
Usha


----- Original Message -----
From: Barry Moore <barry.moore at genetics.utah.edu>
Date: Monday, August 22, 2005 11:45 pm
Subject: Re: [Bioperl-l] bl2seq

> Usha,
> 
> The best advice I can give you is that you need to focus your 
> question a 
> bit more.  What method are you using to compare your probe to your 
> fasta?  Regex, BLAST, Needle, RNAHybrid...?  You say your sequence 
> is 
> working fine for single sequence.  Are you using Bioperl for that?  
> Can 
> you tell us exactly what isn't working for you or what questions 
> you 
> have about working with multiple sequences?  Are you already using 
> Bioperl with your single sequence comparison? Can you show us some 
> code?
> Barry
> 
> Usha Rani Reddi wrote:
> 
> >Hi,
> >I am trying to compare two hundred thousand probes(each one of 
> them) to
> >another genome. Format of the file containing probes is like this
> >SEQ_ID	PROBE_ID	POSITION	PROBE_SEQUENCE
> >NC_004116	1	1	AATTAACATTGTTGATTTTATTCTTCAACATC
> >NC_004116	3	13	TGATTTTATTCTTCAACATCTGTGGAAAACTT
> >NC_004116	5	25	TCAACATCTGTGGAAAACTTTATTTTTTTATG
> >NC_004116	7	37	GAAAACTTTATTTTTTTATGGTACAATATAAC
> >NC_004116	9	49	TTTTTATGGTACAATATAACAATAATTATCCA
> >NC_004116	11	61	AATATAACAATAATTATCCACAAGACAATAAG
> >NC_004116	13	73	ATTATCCACAAGACAATAAGGAAGAAGCTATG
> >NC_004116	15	85	ACAATAAGGAAGAAGCTATGACGGAAAACGAA
> >What I am trying to do is compare PROBE_SEQUENCE to fasta file of
> >Streptococcus agalactiae. I am trying to loop through the probes 
> but not
> >sure how to proceed. My program is working fine for single 
> sequence. One
> >more thing is I am not interested in matches, I want to display only
> >mismatches. I am new to Bioperl, some one please help me with this.
> >Thanks
> >Usha
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at portal.open-bio.org
> >http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >  
> >
> 
> -- 
> Barry Moore
> Dept. of Human Genetics
> University of Utah
> Salt Lake City, UT
> 
> 
> 
> 



More information about the Bioperl-l mailing list