[Bioperl-l] SimpleAlign Query

Fields, Christopher J cjfields at illinois.edu
Thu Jun 20 17:10:17 UTC 2013


Agreed, start/end with genome coordinates has start always less than end; the strand dictates direction for the feature of interest.  You must use this in combination with start/end to get everything you need (this is *NOT* unique to BioPerl, most other bioinformatics software follows the same general dictum).

That was an informative and somewhat highly confusing exchange (doesn't help to have two people with very close names involved :)

chris (fields with an 's')

On Jun 20, 2013, at 11:24 AM, Jun Yin <yinjun111 at gmail.com> wrote:

> Hi, Chris Field,
> 
> I have tried your script. It worked well in my computer. There should not
> be any problem with it.
> 
> For your question, in BioPerl, the sequence object always has the start
> coordinate smaller than the end coordinate. It is not a bug.
> 
> The strand information is kept in the sequence object. You can retrieve
> the strand information by $seq->strand .
> 
> As suggested by Chris J Fields, It is more common to use SearchIO to deal
> with blast output. You can try that module too.
> 
> Cheers,
> -- 
> Jun Yin, Ph.D.
> Postdoc Associate
> 
> James Noonan Group
> Department of Genetics
> Yale University School of Medicine
> Sterling Hall of Medicine, I-120A
> 333 Cedar Street 
> New Haven, CT 06510
> http://www.yale.edu/noonanlab/Jun.html
> 
> 
> 
> 
> 
> On 6/19/13 9:15 AM, "Fields, Christopher J" <cjfields at illinois.edu> wrote:
> 
>> The alignments from BLAST reports are generally parsed via Bio::SearchIO.
>> There is a HOWTO on SearchIO:
>> 
>> http://www.bioperl.org/wiki/HOWTO:SearchIO
>> 
>> Tiling in BioPerl uses these objects, but you can get Bio::SimpleAlign
>> from the HSPs.
>> 
>> chris
>> 
>> On Jun 18, 2013, at 10:49 AM, "Yin, Jun" <jun.yin at yale.edu>
>> wrote:
>> 
>>> Hi, Chris,
>>> 
>>> No problem. I can help to look at what is the problem. Yes, usually in
>>> Bio::SimpleAlign even if the sequence is reverse complementary, the
>>> start
>>> coordinate should be smaller than the end coordinate. But it seems
>>> Bio::AlignIO::bl2seq is a wrapper for Bio::SearchIO::blast. It is a bit
>>> different from the other Bio::AlignIO modules, so there may be some
>>> inconsistency.
>>> 
>>> Can you send me the input file for your script? So I can try to figure
>>> out
>>> where the problem is.
>>> 
>>> I would like to recommend you to ask in the bioperl mail list. There
>>> will
>>> be more experts in BioPerl there.
>>> 
>>> 
>>> Cheers,
>>> -- 
>>> Jun Yin, Ph.D.
>>> Postdoc Associate
>>> 
>>> James Noonan Group
>>> Department of Genetics
>>> Yale University School of Medicine
>>> Sterling Hall of Medicine, I-120A
>>> 333 Cedar Street
>>> New Haven, CT 06510
>>> http://www.yale.edu/noonanlab/Jun.html
>>> 
>>> 
>>> 
>>> 
>>> On 6/18/13 11:23 AM, "Christopher Field" <chris.field at unibas.ch> wrote:
>>> 
>>>> Dear Dr. Yin,
>>>> 
>>>> I've got a problem with SimpleAlign.pm, and since you are responsible
>>>> for
>>>> the most recent updates on this module, I hope it's ok to email and ask
>>>> for your help.
>>>> To summarise my short program, it uses bl2seq to throw some sanger
>>>> reads
>>>> against a genome, and then I'm tiling the reported hits into a visually
>>>> accessible format.
>>>> The problem has arisen, that sometimes the alignment is with the
>>>> reverse
>>>> complement, but the start and end of both sequences in the Align object
>>>> are such that the start is always lower than the end. The strand flag
>>>> also appears uninitialised. Is this a straightforward bug, or a result
>>>> of
>>>> something I'm doing in my code?
>>>> Any help would be greatly appreciated.
>>>> 
>>>> Chris Field
>>>> Group van Nimwegen
>>>> Biozentrum der Universität Basel
>>>> 4056-CH Basel
>>>> 
>>>> Code:
>>>> 
>>>> #!/usr/bin/env perl;
>>>> 
>>>> use v5.8.1;
>>>> use warnings;
>>>> use strict;
>>>> 
>>>> use Bio::Seq;
>>>> use Bio::SeqIO;
>>>> use Bio::AlignIO;
>>>> use Bio::SimpleAlign;
>>>> use Bio::Tools::Run::StandAloneBlast;
>>>> 
>>>> my $usage = "align.pl reference query\n";
>>>> 
>>>> open (my $fo,'>',"align.out",) or die $!;
>>>> 
>>>> my $rfile = shift or die $usage;
>>>> my $qfile = shift or die $usage;
>>>> 
>>>> my $rstream = Bio::SeqIO->new(-file => $rfile);
>>>> my $qstream = Bio::SeqIO->new(-file => $qfile);
>>>> 
>>>> my $rseq = $rstream->next_seq;
>>>> 
>>>> my $align = Bio::Tools::Run::StandAloneBlast->new(-program => 'blastn',
>>>> -outfile => 'bl2seq.out');
>>>> 
>>>> while (my $qseq = $qstream->next_seq){
>>>>      $align->bl2seq($qseq,$rseq);
>>>>      my $astream = Bio::AlignIO->new(-file => 'bl2seq.out', -format =>
>>>> 'bl2seq');
>>>>      my @seq = ("n") x $qseq->length;
>>>>      my @match = (" ") x $qseq->length;
>>>>      my @pos = (" ") x $qseq->length;
>>>>      while (my $align = $astream->next_aln){
>>>>              my $qhit = $align->get_seq_by_pos(1);
>>>>              my $rhit = $align->get_seq_by_pos(2);
>>>>              if(!grep(/[acgt]/, at seq[$qhit->start-1 .. $qhit->end-1])){
>>>>                      print $rhit->start,":",$rhit->end,"\n";
>>>>                      @seq[$qhit->start-1 .. $qhit->end-1] =
>>>> split("",$rhit->seq);
>>>>                      @match[$qhit->start-1 .. $qhit->end-1] =
>>>> split("",$align->match_line);
>>>>                      @pos[$qhit->start-1 .. $qhit->end-1] =
>>>> ("<","-","-",split("",$rhit->start),("-") x
>>>> 
>>>> ($qhit->length-(length($rhit->start)+length($rhit->end)+6)),split("",$rh
>>>> it
>>>> ->end),"-","-",">");
>>>>              }
>>>>      }
>>>>      print $fo $qseq->display_id,"\n";
>>>>      print $fo $qseq->seq,"\n";
>>>>      print $fo @match,"\n";
>>>>      print $fo @seq,"\n";
>>>>      print $fo @pos,"\n\n";
>>>> }
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> 





More information about the Bioperl-l mailing list