[Bioperl-l] trying to save blast hit sequences to fasta file

Xianran Li xianranli78 at yahoo.com.cn
Thu Aug 2 08:56:04 UTC 2007


----- Original Message ----- 
From: "Alicia Amadoz" <Alicia.Amadoz at uv.es>
To: "Torsten Seemann" <torsten.seemann at infotech.monash.edu.au>; <bioperl-l at bioperl.org>
Cc: <jay at jays.net>
Sent: Thursday, August 02, 2007 3:06 PM
Subject: Re: [Bioperl-l] trying to save blast hit sequences to fasta file


> Hi, thanks for your help and suggestions. I have tried the example code
> of Jay Hannah and it works perfectly. But what I need to save in fasta
> format is the whole sequence in the database that is similar to my query
> sequence. I don't understand very well the difference between
> hit_string() and query_string(), are they the whole sequence that is
> similiar (about hit_string), a part of the whole sequence or just the
> part that is aligned to my query string? 

The hit_string() returns the  aligned sequences of the subject in your database and the query_string() is the aligned sequences of the query. These two things will be the same unless there are some mutations and or gaps within the alignment. 

> 
> With the previous code what I have are different sequences in length
> with the same id as my query string, so I am not sure that I am doing
> what I need to do. Any light on this point?

Did you specify the $id before 
  
my $hseq_obj = Bio::Seq->new(-display_id => $id, -seq => $hseq); 

If you didn't, then all the sequences retrieved will get the same id. The following is a simply way to avoid this problem.

my $seq_out = Bio::SeqIO->new("-file" => ">$fasfilename", "-format" =>"fasta");                                                           
my $i;                                                                    
while(my $result = $blast_report->next_result()) {                        
   while(my $hit = $result->next_hit()) {                                 
      while(my $hsp = $hit->next_hsp()) {                                 
            $i ++;                                                      
         my $hseq = $hsp->hit_string();                                   
            $hseq =~ s/-//g; #### remove the gap within the aligment      
         my $id = $i; ###### specifiy the id                            
         my $hseq_obj = Bio::Seq->new(-display_id => $id, -seq => $hseq); 
         # $seq_out->write_seq($hseq);                                    
         $seq_out->write_result($hseq_obj);                               
      }                                                                   
   }                                                                      
}               


Xianran 

> 
> Thank you very much for your help.
> Alicia
> 
> > Alicia,
> > 
> > > Hi, I would like to save my hit sequences from a blast result in a fasta
> > > file. I am trying some things but I have problems using Bio::SearchIO
> > > and Bio::SeqIO. Hope anyone could help me with this. Here is my current
> > > code:
> > > # my $seq_out = Bio::SeqIO->new("-file" => ">$fasfilename", "-format" =>
> > > "fasta");
> > > my $seq_out = Bio::SearchIO->new("-file" => ">$fasfilename", "-format"
> > > => "fasta");
> > > ...
> > >        my $hseq = $hsp->hit_string();
> > >          # $seq_out->write_seq($hseq);
> > >          $seq_out->write_result($hseq);
> > 
> > You have encountered two common problems for BioPerl beginners:
> > 
> > 1. "fasta" means two different things! In SearchIO it refers to the
> > output format of the "fasta" sequence alignment software. In SeqIO it
> > refers to a file format that stores just sequences. Confusing, I know.
> > You need SeqIO and write_seq, not SearchIO and write_result.
> > 
> > 2. $hseq is a STRING which has the raw sequence letters in it.
> > However, the write_seq() method needs a Bio::Seq object (which has
> > extra details like the name and ID) not a raw string.
> > 
> > The example code Jay Hannah supplied in his reply looks pretty good,
> > you should try it.
> > 
> > -- 
> > --Torsten Seemann
> > --Victorian Bioinformatics Consortium, Monash University
> > 
> > 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-lÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿá¶Úÿÿ÷'þf¢—üš†Šÿ




More information about the Bioperl-l mailing list