[Bioperl-guts-l] [Bug 1644] New: non-matching sequence casuses sim4 parsing to silently abort early

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Tue May 25 17:06:56 EDT 2004


http://bugzilla.bioperl.org/show_bug.cgi?id=1644

           Summary: non-matching sequence casuses sim4 parsing to silently
                    abort early
           Product: Bioperl
           Version: unspecified
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Core Components
        AssignedTo: bioperl-guts-l at bioperl.org
        ReportedBy: cariaso at yahoo.com


If sim4 encounters a sequence which does not match to the reference sequence,
parsing of all hits after that point is abandoned. This is probably an error.
Although proper behavior is a bit less certain. For my purposes it should either
omit the sequence from the alignment, or include it as all gaps.

My test case attempts to align 6 sequences, however one of them does not match,
and causes sim4 to abort. I then reverse the order and re-align to show that the
missed ones are now found, and the previously found ones are now missed. 



While I believe this bug to be specific to the parser, I have first encountered
it with the Run::Sim4 modules, so my initial example will be based on that. I
hope to contribute a test case based on pure-parsing shortly.

Test Case:
    {
        my $ref = Bio::Seq->new(-if => 'reference',
                                -seq => join('',qw(
   CTCCTTACTTTTGGGTCGGGCCCTCCGGGAAGATGGCGGCCGTGCAGGCGGCCGAGGTGA
   AAGTGGATGGCAGCGAGCCGAAACTGAGCAAGAACCATGTTGACGCAAGCTGCTGTAAGG
   CTTGTTAGGGGGTCCCTGCGCAAAACCTCCTGGGCAGAGTGGGGTCACAGGGAACTGCGA
   CTGGGTCAACTTGCTCCTTTCACAGCGCCTCACAAGGACAAGTCATTTTCTGATCAAAGA
   AGTGAGCTGAAGAGACGCCTGAAAGCTGAGAAGAAAGTAGCAGAGAAGGAGGCCAAACAG
   AAAGAGCTCAGTGAGAAACAGCTAAGCCAAGCCACTGCTGCTGCCACCAACCACACCACT
   GATAATGGTGTGGGTCCTGAGGAAGAGAGCGTGGACCCAAATCAATACTACAAAATCCGC
   AGTCAAGCAATTCATCAGCTGAAGGTCAATGGGGAAGACCCATACCCACACAAGTTCCAT
   GTAGACATCTCACTCACTGACTTCATCCAAAAATATAGTCACCTGCAGCCTGGGGATCAC
   CTGACTGACATCACCTTAAAGGTGGCAGGTAGGATCCATGCCAAAAGAGCTTCTGGGGGA
   AAGCTCATCTTCTATGATCTTCGAGGAGAGGGGGTGAAGTTGCAAGTCATGGCCAATTCC
   AGAAATTATAAATCAGAAGAAGAATTTATTCATATTAATAACAAACTGCGTCGGGGAGAC
   ATAATTGGAGTTCAGGGGAATCCTGGTAAAACCAAGAAGGGTGAGCTGAGCATCATTCCG
   TATGAGATCACACTGCTGTCTCCCTGTTTGCATATGTTACCTCATCTTCACTTTGGCCTC
   AAAGACAAGGAAACAAGGTATCGCCAGAGATACTTGGACTTGATCCTGAATGACTTTGTG
   AGGCAGAAATTTATCATCCGCTCTAAGATCATCACATATATAAGAAGTTTCTTAGATGAG
   CTGGGATTCCTAGAGATTGAAACTCCCATGATGAACATCATCCCAGGGGGAGCCGTGGCC
   AAGCCTTTCATCACTTATCACAACGAGCTGGACATGAACTTATATATGAGAATTGCTCCA
   GAACTCTATCATAAGATGCTTGTGGTTGGTGGCATCGACCGGGTTTATGAAATTGGACGC
   CAGTTCCGGAATGAGGGGATTGATTTGACGCACAATCCTGAGTTCACCACCTGTGAGTTC
   TACATGGCCTATGCAGACTATCACGATCTCATGGAAATCACGGAGAAGATGGTTTCAGGG
   ATGGTGAAGCATATTACAGGCAGTTACAAGGTCACCTACCACCCAGATGGCCCAGAGGGC
   CAAGCCTACGATGTTGACTTCACCCCACCCTTCCGGCGAATCAACATGGTAGAAGAGCTT
   GAGAAAGCCCTGGGGATGAAGCTGCCAGAAACGAACCTCTTTGAAACTGAAGAAACTCGC
   AAAATTCTTGATGATATCTGTGTGGCAAAAGCTGTTGAATGCCCTCCACCTCGGACCACA
   GCCAGGCTCCTTGACAAGCTTGTTGGGGAGTTCCTGGAAGTGACTTGCATCAATCCTACA
   TTCATCTGTGATCACCCACAGATAATGAGCCCTTTGGCTAAATGGCACCGCTCTAAAGAG
   GGTCTGACTGAGCGCTTTGAGCTGTTTGTCATGAAGAAAGAGATATGCAATGCGTATACT
   GAGCTGAATGATCCCATGCGGCAGCGGCAGCTTTTTGAAGAACAGGCCAAGGCCAAGGCT
   GCAGGTGATGATGAGGCCATGTTCATAGATGAAAACTTCTGTACTGCCCTGGAATATGGG
   CTGCCCCCCACAGCTGGCTGGGGCATGGGCATTGATCGAGTCGCCATGTTTCTCACGGAC
   TCCAACAACATCAAGGAAGTACTTCTGTTTCCTGCCATGAAACCCGAAGACAAGAAGGAG
   AATGTAGCAACCACTGATACACTGGAAAGCACAACAGTTGGCACTTCTGTCTAGAAAATA
   ATAATTGCAAGTTGTATAACTCAGGCGTCTTTGCATTTCTGCGAAAGATCAAGGTCTGCA
   AGGGAATTCTTGTGTGCTGCTTTCCATTTGACACCGCAGTTCTGTTCAGCCATCAGAAGA
   GAGACAAGGAATTAAAAATTTCTTTTTAATCCTGTTACCAAATAAGCCATTTGTCTCTTC
   TCTTTACTTTTAGAATATCAACTTTTTTCCCCAGACTTCTGGCCTAATGAATGTTTAGGA
   TTTTACCATCTT
                                                   )));
        my @seqs = (
                    Bio::Seq->new(-id => 'non_lethal_1',
                                  -seq => 'TGAGCTGAAGAGACGCCTG'),
                    Bio::Seq->new(-id => 'non_lethal_2',
                                  -seq => 'TTGCAACTTCACCCCCTC'),
                    Bio::Seq->new(-id => 'non_lethal_3',
                                  -seq => 'GAGCTTCTGGGGGAAAGC'),
                    Bio::Seq->new(-id => 'lethal_1',
                                  -seq => 'CGCACCTTGACCTGGAAA'),
                    Bio::Seq->new(-id => 'non_lethal_4',
                                  -seq => 'CATAAACCCGGTCGATGC'),
                    Bio::Seq->new(-id => 'non_lethal_5',
                                  -seq => 'ACATCATCCCAGGGGGAG'),
                    );

        my %orders = ('forward' => [         @seqs],
                      'reverse' => [ reverse @seqs]);

        my $sim4 = Bio::Tools::Run::Alignment::Sim4->new();
        $sim4->executable('/usr/local/bin/sim4');


        foreach my $key (keys %orders) {

            print "Run $key\n\n";
            my @exon_sets = $sim4->align($orders{$key}, $ref);
            foreach my $set (@exon_sets){
                foreach my $exon($set->sub_SeqFeature){
                    print $exon->start."\t".$exon->end."\t".$exon->strand;
                    print "\tMatched
".$exon->est_hit->seq_id."\t".$exon->est_hit->start."\t".$exon->est_hit->end."\n";
                }
            }
        }


        return;

    }



What you will see in the output is:

Run forward

243	261	1	Matched non_lethal_1	1	19
628	645	-1	Matched non_lethal_2	1	18
587	604	1	Matched non_lethal_3	1	18
Run reverse

995	1012	1	Matched non_lethal_5	1	18
1112	1129	-1	Matched non_lethal_4	1	18



which should probably instead be:

243	261	1	Matched non_lethal_1	1	19
628	645	-1	Matched non_lethal_2	1	18
587	604	1	Matched non_lethal_3	1	18
1112	1129	-1	Matched non_lethal_4	1	18
995	1012	1	Matched non_lethal_5	1	18
Run reverse

995	1012	1	Matched non_lethal_5	1	18
1112	1129	-1	Matched non_lethal_4	1	18
587	604	1	Matched non_lethal_3	1	18
628	645	-1	Matched non_lethal_2	1	18
243	261	1	Matched non_lethal_1	1	19



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the Bioperl-guts-l mailing list