[Bioperl-guts-l] [Bug 1644] New: non-matching sequence casuses sim4
parsing to silently abort early
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Tue May 25 17:06:56 EDT 2004
http://bugzilla.bioperl.org/show_bug.cgi?id=1644
Summary: non-matching sequence casuses sim4 parsing to silently
abort early
Product: Bioperl
Version: unspecified
Platform: PC
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Core Components
AssignedTo: bioperl-guts-l at bioperl.org
ReportedBy: cariaso at yahoo.com
If sim4 encounters a sequence which does not match to the reference sequence,
parsing of all hits after that point is abandoned. This is probably an error.
Although proper behavior is a bit less certain. For my purposes it should either
omit the sequence from the alignment, or include it as all gaps.
My test case attempts to align 6 sequences, however one of them does not match,
and causes sim4 to abort. I then reverse the order and re-align to show that the
missed ones are now found, and the previously found ones are now missed.
While I believe this bug to be specific to the parser, I have first encountered
it with the Run::Sim4 modules, so my initial example will be based on that. I
hope to contribute a test case based on pure-parsing shortly.
Test Case:
{
my $ref = Bio::Seq->new(-if => 'reference',
-seq => join('',qw(
CTCCTTACTTTTGGGTCGGGCCCTCCGGGAAGATGGCGGCCGTGCAGGCGGCCGAGGTGA
AAGTGGATGGCAGCGAGCCGAAACTGAGCAAGAACCATGTTGACGCAAGCTGCTGTAAGG
CTTGTTAGGGGGTCCCTGCGCAAAACCTCCTGGGCAGAGTGGGGTCACAGGGAACTGCGA
CTGGGTCAACTTGCTCCTTTCACAGCGCCTCACAAGGACAAGTCATTTTCTGATCAAAGA
AGTGAGCTGAAGAGACGCCTGAAAGCTGAGAAGAAAGTAGCAGAGAAGGAGGCCAAACAG
AAAGAGCTCAGTGAGAAACAGCTAAGCCAAGCCACTGCTGCTGCCACCAACCACACCACT
GATAATGGTGTGGGTCCTGAGGAAGAGAGCGTGGACCCAAATCAATACTACAAAATCCGC
AGTCAAGCAATTCATCAGCTGAAGGTCAATGGGGAAGACCCATACCCACACAAGTTCCAT
GTAGACATCTCACTCACTGACTTCATCCAAAAATATAGTCACCTGCAGCCTGGGGATCAC
CTGACTGACATCACCTTAAAGGTGGCAGGTAGGATCCATGCCAAAAGAGCTTCTGGGGGA
AAGCTCATCTTCTATGATCTTCGAGGAGAGGGGGTGAAGTTGCAAGTCATGGCCAATTCC
AGAAATTATAAATCAGAAGAAGAATTTATTCATATTAATAACAAACTGCGTCGGGGAGAC
ATAATTGGAGTTCAGGGGAATCCTGGTAAAACCAAGAAGGGTGAGCTGAGCATCATTCCG
TATGAGATCACACTGCTGTCTCCCTGTTTGCATATGTTACCTCATCTTCACTTTGGCCTC
AAAGACAAGGAAACAAGGTATCGCCAGAGATACTTGGACTTGATCCTGAATGACTTTGTG
AGGCAGAAATTTATCATCCGCTCTAAGATCATCACATATATAAGAAGTTTCTTAGATGAG
CTGGGATTCCTAGAGATTGAAACTCCCATGATGAACATCATCCCAGGGGGAGCCGTGGCC
AAGCCTTTCATCACTTATCACAACGAGCTGGACATGAACTTATATATGAGAATTGCTCCA
GAACTCTATCATAAGATGCTTGTGGTTGGTGGCATCGACCGGGTTTATGAAATTGGACGC
CAGTTCCGGAATGAGGGGATTGATTTGACGCACAATCCTGAGTTCACCACCTGTGAGTTC
TACATGGCCTATGCAGACTATCACGATCTCATGGAAATCACGGAGAAGATGGTTTCAGGG
ATGGTGAAGCATATTACAGGCAGTTACAAGGTCACCTACCACCCAGATGGCCCAGAGGGC
CAAGCCTACGATGTTGACTTCACCCCACCCTTCCGGCGAATCAACATGGTAGAAGAGCTT
GAGAAAGCCCTGGGGATGAAGCTGCCAGAAACGAACCTCTTTGAAACTGAAGAAACTCGC
AAAATTCTTGATGATATCTGTGTGGCAAAAGCTGTTGAATGCCCTCCACCTCGGACCACA
GCCAGGCTCCTTGACAAGCTTGTTGGGGAGTTCCTGGAAGTGACTTGCATCAATCCTACA
TTCATCTGTGATCACCCACAGATAATGAGCCCTTTGGCTAAATGGCACCGCTCTAAAGAG
GGTCTGACTGAGCGCTTTGAGCTGTTTGTCATGAAGAAAGAGATATGCAATGCGTATACT
GAGCTGAATGATCCCATGCGGCAGCGGCAGCTTTTTGAAGAACAGGCCAAGGCCAAGGCT
GCAGGTGATGATGAGGCCATGTTCATAGATGAAAACTTCTGTACTGCCCTGGAATATGGG
CTGCCCCCCACAGCTGGCTGGGGCATGGGCATTGATCGAGTCGCCATGTTTCTCACGGAC
TCCAACAACATCAAGGAAGTACTTCTGTTTCCTGCCATGAAACCCGAAGACAAGAAGGAG
AATGTAGCAACCACTGATACACTGGAAAGCACAACAGTTGGCACTTCTGTCTAGAAAATA
ATAATTGCAAGTTGTATAACTCAGGCGTCTTTGCATTTCTGCGAAAGATCAAGGTCTGCA
AGGGAATTCTTGTGTGCTGCTTTCCATTTGACACCGCAGTTCTGTTCAGCCATCAGAAGA
GAGACAAGGAATTAAAAATTTCTTTTTAATCCTGTTACCAAATAAGCCATTTGTCTCTTC
TCTTTACTTTTAGAATATCAACTTTTTTCCCCAGACTTCTGGCCTAATGAATGTTTAGGA
TTTTACCATCTT
)));
my @seqs = (
Bio::Seq->new(-id => 'non_lethal_1',
-seq => 'TGAGCTGAAGAGACGCCTG'),
Bio::Seq->new(-id => 'non_lethal_2',
-seq => 'TTGCAACTTCACCCCCTC'),
Bio::Seq->new(-id => 'non_lethal_3',
-seq => 'GAGCTTCTGGGGGAAAGC'),
Bio::Seq->new(-id => 'lethal_1',
-seq => 'CGCACCTTGACCTGGAAA'),
Bio::Seq->new(-id => 'non_lethal_4',
-seq => 'CATAAACCCGGTCGATGC'),
Bio::Seq->new(-id => 'non_lethal_5',
-seq => 'ACATCATCCCAGGGGGAG'),
);
my %orders = ('forward' => [ @seqs],
'reverse' => [ reverse @seqs]);
my $sim4 = Bio::Tools::Run::Alignment::Sim4->new();
$sim4->executable('/usr/local/bin/sim4');
foreach my $key (keys %orders) {
print "Run $key\n\n";
my @exon_sets = $sim4->align($orders{$key}, $ref);
foreach my $set (@exon_sets){
foreach my $exon($set->sub_SeqFeature){
print $exon->start."\t".$exon->end."\t".$exon->strand;
print "\tMatched
".$exon->est_hit->seq_id."\t".$exon->est_hit->start."\t".$exon->est_hit->end."\n";
}
}
}
return;
}
What you will see in the output is:
Run forward
243 261 1 Matched non_lethal_1 1 19
628 645 -1 Matched non_lethal_2 1 18
587 604 1 Matched non_lethal_3 1 18
Run reverse
995 1012 1 Matched non_lethal_5 1 18
1112 1129 -1 Matched non_lethal_4 1 18
which should probably instead be:
243 261 1 Matched non_lethal_1 1 19
628 645 -1 Matched non_lethal_2 1 18
587 604 1 Matched non_lethal_3 1 18
1112 1129 -1 Matched non_lethal_4 1 18
995 1012 1 Matched non_lethal_5 1 18
Run reverse
995 1012 1 Matched non_lethal_5 1 18
1112 1129 -1 Matched non_lethal_4 1 18
587 604 1 Matched non_lethal_3 1 18
628 645 -1 Matched non_lethal_2 1 18
243 261 1 Matched non_lethal_1 1 19
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Bioperl-guts-l
mailing list