From lstein at dev.open-bio.org Fri May 1 13:35:57 2009 From: lstein at dev.open-bio.org (Lincoln Stein) Date: Fri, 1 May 2009 13:35:57 -0400 Subject: [Bioperl-guts-l] [15660] bioperl-live/trunk/Bio/DB/SeqFeature/Store.pm: fixed documentation of segment() method to indicate cases in which multiple segment errors arise Message-ID: <200905011735.n41HZvVo009505@dev.open-bio.org> Revision: 15660 Author: lstein Date: 2009-05-01 13:35:56 -0400 (Fri, 01 May 2009) Log Message: ----------- fixed documentation of segment() method to indicate cases in which multiple segment errors arise Modified Paths: -------------- bioperl-live/trunk/Bio/DB/SeqFeature/Store.pm Modified: bioperl-live/trunk/Bio/DB/SeqFeature/Store.pm =================================================================== --- bioperl-live/trunk/Bio/DB/SeqFeature/Store.pm 2009-04-28 15:59:46 UTC (rev 15659) +++ bioperl-live/trunk/Bio/DB/SeqFeature/Store.pm 2009-05-01 17:35:56 UTC (rev 15660) @@ -1275,9 +1275,22 @@ a get_features_by_name() internally and then transform the feature into the appropriate coordinates. -If $absolute is a true value, then the specified coordinates are -relative to the reference (absolute) coordinates. +The named feature should exist once and only once in the database. If +it exists multiple times in the database and you attempt to call +segment() in a scalar context, you will get an exception. A workaround +is to call the method in a list context, as in: + my ($segment) = $db->segment('contig23',1,1000); + +or + my @segments = $db->segment('contig23',1,1000); + +However, having multiple same-named features in the database is often +an indication of underlying data problems. + +If the optional $absolute argument is a true value, then the specified +coordinates are relative to the reference (absolute) coordinates. + =cut ### From bugzilla-daemon at portal.open-bio.org Mon May 4 07:38:55 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 4 May 2009 07:38:55 -0400 Subject: [Bioperl-guts-l] [Bug 2823] New: Bio::SeqIO::embl->next_seq corrupted with "Segmentation fault" when parsing million-line entries Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2823 Summary: Bio::SeqIO::embl->next_seq corrupted with "Segmentation fault" when parsing million-line entries Product: BioPerl Version: 1.6 branch Platform: PC OS/Version: Linux Status: NEW Severity: blocker Priority: P2 Component: Bio::SeqIO AssignedTo: bioperl-guts-l at bioperl.org ReportedBy: brianli.cas at gmail.com Platform: BioPerl 1.6.0 Perl 5.8.8 Ubuntu 8.04 LTS Server 64-bit version (Linux 2.6.24-23-server) When parsing EMBL file rel_ann_mus_01_r99.dat which has big million-line entries, Bio::SeqIO::embl->next_seq gives "Segmentation fault". This happens when tring to get the first entry with next_seq. An zipped version of the data file I tried to parse is available at ftp://bio-mirror.net/biomirror/embl/release/rel_ann_mus_01_r99.dat.gz # The code I use my $seqio = Bio::SeqIO->new(-file => 'rel_ann_mus_01_r99.dat', -format => 'EMBL'); my $index = 1; while (my $seq = $seqio->next_seq) { print "Dealing with entry: $index\n"; # Some parse process $index++; } # end of code -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon May 4 14:23:06 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 4 May 2009 14:23:06 -0400 Subject: [Bioperl-guts-l] [Bug 2823] Bio::SeqIO::embl->next_seq corrupted with "Segmentation fault" when parsing million-line entries In-Reply-To: Message-ID: <200905041823.n44IN6sW006320@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2823 cjfields at bioperl.org changed: What |Removed |Added ---------------------------------------------------------------------------- Target Milestone|1.6 release |1.6.x point release ------- Comment #1 from cjfields at bioperl.org 2009-05-04 14:23 EST ------- Pretty sure this is Bio::Species related, but I'll have to delve into it a bit further. Moving to 1.6.x just in case. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon May 4 15:58:23 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 4 May 2009 15:58:23 -0400 Subject: [Bioperl-guts-l] [Bug 2823] Bio::SeqIO::embl->next_seq corrupted with "Segmentation fault" when parsing million-line entries In-Reply-To: Message-ID: <200905041958.n44JwNSO013194@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2823 cjfields at bioperl.org changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|blocker |normal Status|NEW |RESOLVED Resolution| |WONTFIX ------- Comment #2 from cjfields at bioperl.org 2009-05-04 15:58 EST ------- Not a bug, per se. The problem here has to do with the sequences you are trying to load into memory, which represent full-length eukaryotic chromosome builds and relevant features. The first record in the file you are trying to load is: ID CH466519; SV 1; linear; genomic DNA; ANN; MUS; 112224630 BP. So, yes, you'll very likely segfault after attempting to load all annotation, features, and sequence information into memory. As we can't derive what the memory footprint for any particular Bio::Seq is until it's loaded there really isn't much we can do until we create a lazily implemented Bio::SeqI (and the proper iterative interfaces for Features). That's not high on anyone's priority list, as most consider the best option is to use a relational database capable of storing the data you need and that can access segments of the sequence you want w/o the memory overhead. I personally use the Ensembl Perl API, but UCSC and Bio::DB::SeqFeature::Store also come to mind. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From cjfields at dev.open-bio.org Mon May 4 16:11:38 2009 From: cjfields at dev.open-bio.org (Christopher John Fields) Date: Mon, 4 May 2009 16:11:38 -0400 Subject: [Bioperl-guts-l] [15661] bioperl-live/trunk: [bug 2775] Message-ID: <200905042011.n44KBcub000837@dev.open-bio.org> Revision: 15661 Author: cjfields Date: 2009-05-04 16:11:36 -0400 (Mon, 04 May 2009) Log Message: ----------- [bug 2775] * carry over is_circular * more permanent fix for persisting some object meta data TBD Modified Paths: -------------- bioperl-live/trunk/Bio/PrimarySeqI.pm bioperl-live/trunk/t/SeqTools/SeqUtils.t Modified: bioperl-live/trunk/Bio/PrimarySeqI.pm =================================================================== --- bioperl-live/trunk/Bio/PrimarySeqI.pm 2009-05-01 17:35:56 UTC (rev 15660) +++ bioperl-live/trunk/Bio/PrimarySeqI.pm 2009-05-04 20:11:36 UTC (rev 15661) @@ -419,6 +419,7 @@ $self->_attempt_to_load_Seq(); } my $out = $seqclass->new( '-seq' => $revseq, + '-is_circular' => $self->is_circular, '-display_id' => $self->display_id, '-accession_number' => $self->accession_number, '-alphabet' => $self->alphabet, Modified: bioperl-live/trunk/t/SeqTools/SeqUtils.t =================================================================== --- bioperl-live/trunk/t/SeqTools/SeqUtils.t 2009-05-01 17:35:56 UTC (rev 15660) +++ bioperl-live/trunk/t/SeqTools/SeqUtils.t 2009-05-04 20:11:36 UTC (rev 15661) @@ -8,7 +8,7 @@ use List::MoreUtils qw(uniq); use Bio::Root::Test; - test_begin(-tests => 49); + test_begin(-tests => 51); use_ok('Bio::PrimarySeq'); use_ok('Bio::SeqUtils'); @@ -276,3 +276,7 @@ is $revfeat[1]->location->to_FTstring, '1..4'; is_deeply([uniq sort map{$_->get_all_tags}$revcom->get_SeqFeatures], [sort qw(note comment)], 'revcom_with_features - has expected tags'); is_deeply([sort map{$_->get_tagset_values('note')}$revcom->get_SeqFeatures], [sort qw(note2 note3a note3b)], 'revcom_with_features - has expected tag values'); +# check circularity +isnt($revcom->is_circular, 1, 'still not circular'); +$seq3 = Bio::Seq->new(-id => 3, -seq => 'ggttaaaa', -description => 'third', -is_circular => 1); +is(Bio::SeqUtils->revcom_with_features($seq3)->is_circular, 1, 'still circular'); From bugzilla-daemon at portal.open-bio.org Mon May 4 16:13:43 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 4 May 2009 16:13:43 -0400 Subject: [Bioperl-guts-l] [Bug 2775] is_circular not maintained through ->revcom In-Reply-To: Message-ID: <200905042013.n44KDhTd014553@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2775 cjfields at bioperl.org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #4 from cjfields at bioperl.org 2009-05-04 16:13 EST ------- Committed this to svn. I do agree a more consistent mode of carrying over meta data is needed (similar to clone). This shouldn't be too hard to implement in per mark's suggestion and will likely be taken care of in the future refactoring of LocatableSeq and other classes. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From cjfields at dev.open-bio.org Mon May 4 16:15:28 2009 From: cjfields at dev.open-bio.org (Christopher John Fields) Date: Mon, 4 May 2009 16:15:28 -0400 Subject: [Bioperl-guts-l] [15662] bioperl-live/trunk/t/Tools/Run/StandAloneBlast.t: minor formatting fix Message-ID: <200905042015.n44KFSqn000868@dev.open-bio.org> Revision: 15662 Author: cjfields Date: 2009-05-04 16:15:28 -0400 (Mon, 04 May 2009) Log Message: ----------- minor formatting fix Modified Paths: -------------- bioperl-live/trunk/t/Tools/Run/StandAloneBlast.t Modified: bioperl-live/trunk/t/Tools/Run/StandAloneBlast.t =================================================================== --- bioperl-live/trunk/t/Tools/Run/StandAloneBlast.t 2009-05-04 20:11:36 UTC (rev 15661) +++ bioperl-live/trunk/t/Tools/Run/StandAloneBlast.t 2009-05-04 20:15:28 UTC (rev 15662) @@ -71,11 +71,11 @@ # dashed parameters should work my $outfile = test_output_file(); ok my $factory = Bio::Tools::Run::StandAloneBlast->new(-verbose => $verbose, - -program => 'blastn', - -database => $nt_database , - -_READMETHOD => 'SearchIO', - -output => $outfile, - -verbose => 0); + -program => 'blastn', + -database => $nt_database , + -_READMETHOD => 'SearchIO', + -output => $outfile, + -verbose => 0); is $factory->database, $nt_database; # Setup and then do tests that actually run blast From bugzilla-daemon at portal.open-bio.org Mon May 4 18:18:00 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 4 May 2009 18:18:00 -0400 Subject: [Bioperl-guts-l] [Bug 2773] Bio::Tree::Node gets destroyed even though it is still live In-Reply-To: Message-ID: <200905042218.n44MI0gK022862@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2773 ------- Comment #2 from cjfields at bioperl.org 2009-05-04 18:17 EST ------- Morgan, can you attach a simplified script using the link above demonstrating the issue? I need something I can replicate to fix the issue. The code you have attached is a bit long, has several wrapped lines, and doesn't actually demonstrate the problem. If I don't hear back, I'll have to close the bug report. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon May 4 22:23:26 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 4 May 2009 22:23:26 -0400 Subject: [Bioperl-guts-l] [Bug 2823] Bio::SeqIO::embl->next_seq corrupted with "Segmentation fault" when parsing million-line entries In-Reply-To: Message-ID: <200905050223.n452NQa9006592@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2823 ------- Comment #3 from brianli.cas at gmail.com 2009-05-04 22:23 EST ------- Thanks for your suggestion of other APIs. I will try to work with them. I have to add all flat EMBL files into relational databases for easy generation of statistic reports. I agree with you that it's not a good idea to load all features and sequences into memory. Then I tried Bio::Seq::SeqBuilder->add_unwanted_slot('features', 'seq', 'annotation'). Segfault popped again. Will unwanted slots still be loaded? I wonder why there is "Segmentation fault". Is it because of memory shortage? I have tracked the memory use with `free -s 1`. The free memory size stays at about 20GB (buffer counted in). Could you tell more about why this error happens. (In reply to comment #2) > Not a bug, per se. The problem here has to do with the sequences you are > trying to load into memory, which represent full-length eukaryotic chromosome > builds and relevant features. The first record in the file you are trying to > load is: > > ID CH466519; SV 1; linear; genomic DNA; ANN; MUS; 112224630 BP. > > So, yes, you'll very likely segfault after attempting to load all annotation, > features, and sequence information into memory. As we can't derive what the > memory footprint for any particular Bio::Seq is until it's loaded there really > isn't much we can do until we create a lazily implemented Bio::SeqI (and the > proper iterative interfaces for Features). That's not high on anyone's > priority list, as most consider the best option is to use a relational database > capable of storing the data you need and that can access segments of the > sequence you want w/o the memory overhead. > > I personally use the Ensembl Perl API, but UCSC and Bio::DB::SeqFeature::Store > also come to mind. > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue May 5 22:52:01 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 5 May 2009 22:52:01 -0400 Subject: [Bioperl-guts-l] [Bug 2823] Bio::SeqIO::embl->next_seq corrupted with "Segmentation fault" when parsing million-line entries In-Reply-To: Message-ID: <200905060252.n462q19F001446@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2823 brianli.cas at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |brianli.cas at gmail.com Status|RESOLVED |REOPENED Resolution|WONTFIX | ------- Comment #4 from brianli.cas at gmail.com 2009-05-05 22:52 EST ------- I agree with Chris that it's not a good idea to load all features and sequences into memory. Then I tried Bio::Seq::SeqBuilder->add_unwanted_slot('features', 'seq', 'annotation'). Segfault popped again. Will unwanted slots still be loaded? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 6 08:35:54 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 6 May 2009 08:35:54 -0400 Subject: [Bioperl-guts-l] [Bug 2823] Bio::SeqIO::embl->next_seq corrupted with "Segmentation fault" when parsing million-line entries In-Reply-To: Message-ID: <200905061235.n46CZsNo008611@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2823 ------- Comment #5 from cjfields at bioperl.org 2009-05-06 08:35 EST ------- I'll take a look; it may be incomplete integration of SeqBuilder into EMBL parsing. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From cjfields at dev.open-bio.org Wed May 6 12:03:20 2009 From: cjfields at dev.open-bio.org (Christopher John Fields) Date: Wed, 6 May 2009 12:03:20 -0400 Subject: [Bioperl-guts-l] [15663] bioperl-live/trunk/README: test commit Message-ID: <200905061603.n46G3Ktn009705@dev.open-bio.org> Revision: 15663 Author: cjfields Date: 2009-05-06 12:03:19 -0400 (Wed, 06 May 2009) Log Message: ----------- test commit Modified Paths: -------------- bioperl-live/trunk/README Modified: bioperl-live/trunk/README =================================================================== --- bioperl-live/trunk/README 2009-05-04 20:15:28 UTC (rev 15662) +++ bioperl-live/trunk/README 2009-05-06 16:03:19 UTC (rev 15663) @@ -2,7 +2,7 @@ This is the README file for the Bioperl central distribution. -o Version +o Version This is bioperl-live, from BioPerl Subversion HEAD From cjfields at dev.open-bio.org Wed May 6 12:12:01 2009 From: cjfields at dev.open-bio.org (Christopher John Fields) Date: Wed, 6 May 2009 12:12:01 -0400 Subject: [Bioperl-guts-l] [15664] bioperl-live/trunk/README: revert last commit (test) Message-ID: <200905061612.n46GC1AE009880@dev.open-bio.org> Revision: 15664 Author: cjfields Date: 2009-05-06 12:12:00 -0400 (Wed, 06 May 2009) Log Message: ----------- revert last commit (test) Modified Paths: -------------- bioperl-live/trunk/README Modified: bioperl-live/trunk/README =================================================================== --- bioperl-live/trunk/README 2009-05-06 16:03:19 UTC (rev 15663) +++ bioperl-live/trunk/README 2009-05-06 16:12:00 UTC (rev 15664) @@ -2,7 +2,7 @@ This is the README file for the Bioperl central distribution. -o Version +o Version This is bioperl-live, from BioPerl Subversion HEAD From heikki at dev.open-bio.org Wed May 6 13:20:34 2009 From: heikki at dev.open-bio.org (Heikki Lehvaslaiho) Date: Wed, 6 May 2009 13:20:34 -0400 Subject: [Bioperl-guts-l] [15665] bioperl-live/trunk/Bio/Tree/Node.pm: remove spurious documentation Message-ID: <200905061720.n46HKYmP010108@dev.open-bio.org> Revision: 15665 Author: heikki Date: 2009-05-06 13:20:34 -0400 (Wed, 06 May 2009) Log Message: ----------- remove spurious documentation Modified Paths: -------------- bioperl-live/trunk/Bio/Tree/Node.pm Modified: bioperl-live/trunk/Bio/Tree/Node.pm =================================================================== --- bioperl-live/trunk/Bio/Tree/Node.pm 2009-05-06 16:12:00 UTC (rev 15664) +++ bioperl-live/trunk/Bio/Tree/Node.pm 2009-05-06 17:20:34 UTC (rev 15665) @@ -589,14 +589,6 @@ return $isleaf; } -=head2 to_string - - Title : to_string - Usage : my $str = $node->to_string() - Function: For debugging, provide a node as a string - Returns : string - Args : none - =head2 height Title : height From heikki at dev.open-bio.org Wed May 6 13:32:54 2009 From: heikki at dev.open-bio.org (Heikki Lehvaslaiho) Date: Wed, 6 May 2009 13:32:54 -0400 Subject: [Bioperl-guts-l] [15666] bioperl-live/trunk: added methods sum_of_leaf_distances() and statratio() Message-ID: <200905061732.n46HWsAh010147@dev.open-bio.org> Revision: 15666 Author: heikki Date: 2009-05-06 13:32:54 -0400 (Wed, 06 May 2009) Log Message: ----------- added methods sum_of_leaf_distances() and statratio() Modified Paths: -------------- bioperl-live/trunk/Bio/Tree/Statistics.pm bioperl-live/trunk/t/Tree/TreeStatistics.t Modified: bioperl-live/trunk/Bio/Tree/Statistics.pm =================================================================== --- bioperl-live/trunk/Bio/Tree/Statistics.pm 2009-05-06 17:20:34 UTC (rev 15665) +++ bioperl-live/trunk/Bio/Tree/Statistics.pm 2009-05-06 17:32:54 UTC (rev 15666) @@ -2,7 +2,7 @@ # # BioPerl module for Bio::Tree::Statistics # -# Please direct questions and support issues to +# Please direct questions and support issues to # # Cared for by Jason Stajich # @@ -20,15 +20,15 @@ use Bio::Tree::Statistics; - =head1 DESCRIPTION This should be where Tree statistics are calculated. It was -previously where statistics from a Coalescent simulation. Currently -it is empty because we have not added any Tree specific statistic -calculations to this module yet. We welcome any contributions. +previously where statistics from a Coalescent simulation. +It now contains several methods for calculating L. + =head1 FEEDBACK =head2 Mailing Lists @@ -112,14 +112,14 @@ my @consensus; # internal nodes are defined by their children - + my (%lookup,%internal); my $i = 0; for my $tree ( $guide_tree, @$bs_trees ) { # Do this as a top down approach, can probably be # improved by caching internal node states, but not going # to worry about it right now. - + my @allnodes = $tree->get_nodes; my @internalnodes = grep { ! $_->is_Leaf } @allnodes; for my $node ( @internalnodes ) { @@ -221,10 +221,10 @@ and the trait value with: - $traitvalue = $node->->get_tag_values('ps_trait'); + $traitvalue = $node->->get_tag_values('ps_trait'); # only the first @traitvalues = $node->->get_tag_values('ps_trait'); -Note that there can be more that one trait values, especially for the +Note that there can be more that one trait value, especially for the root node. =cut @@ -256,8 +256,9 @@ This is the first half of the Fitch algorithm that is enough for -calculating the parsimony values. The trait/chararacter states are -commonly left in ambiguos state. To resolve them, run L. +calculating the resolved parsimony values. The trait/chararacter +states are commonly left in ambiguos state. To resolve them, run +L. =cut @@ -400,13 +401,6 @@ Depends on Fitch's parsimony score (PS). - - PERSISTENCE (T, A) - If S(T) ? A, then 0 - ElseIf T is a leaf, then 1 - Else MIN( PERSISTENCE (L, A), PERSISTENCE (R, A) ) + 1 - - =cut sub _persistence { @@ -460,19 +454,6 @@ Depends on Fitch's parsimony score (PS). - COUNT-CLUSTER (T, A) - If P(T)=0, then if S(T)=A then 0, else 1 - ElseIf S(T) = A, then return COUNT-CLUSTER (L,A) + COUNT-CLUSTER (R,A) - Else 1. - - equivalent: - - COUNT-SUBCLUSTER (T, A) - If S(T) = A - then if P(T)=0 then 0. - Else COUNT-SUBCLUSTER (L,A) + COUNT-SUBCLUSTER (R,A) - Else 1. - =cut sub _count_subclusters { @@ -526,12 +507,6 @@ Depends on Fitch's parsimony score (PS). - COUNT-STRAIN (T, A) - If T is a leaf, then if S(T)=A, then return 1, else return 0 - ElseIf S(T) = A, then return COUNT-STRAIN (L) + COUNT-STRAIN (R) - Else return 0. - - =cut sub _count_leaves { @@ -573,7 +548,7 @@ =head2 phylotype_length Example : phylotype_length($tree, $node); - Description: Sums up the branch lengths from stem to leaf + Description: Sums up the branch lengths within phylotype exluding the subclusters where the trait values are different Returns : float, length @@ -583,23 +558,6 @@ Depends on Fitch's parsimony score (PS). - SUM (T, A) - If (S(T) ? A) or (T is a leaf), then return 0; - Else, let - cl = COUNT-STRAIN (L,A) and cr = COUNT-STRAIN (R,A) - sl = SUM(L,A) and sr = SUM(R,A) - Return cl * l(T,L) + sl + cr * l(T,R) + sr. - - - PHYLOTYPE_LENGTH(T,A) - If S(T) ? A then return 0 - If T is a leaf return BRANCH_LENGTH(T) - FOR EACH CHILD(T) - ln = PHYLOTYPE_LENGTH(CHILD, A) - lenght = ln - length += BRANCH_LENGTH(CHILD) if CHILD is not leaf and ln - return lenght - =cut sub _phylotype_length { @@ -625,6 +583,7 @@ return $length; } + sub phylotype_length { my $self = shift; my $tree = shift; @@ -636,13 +595,60 @@ return $self->_phylotype_length($tree, $node, $value); } +=head2 sum_of_leaf_distances + Example : sum_of_leaf_distances($tree, $node); + Description: Sums up the branch lengths from root to leaf + exluding the subclusters where the trait values + are different + Returns : float, length + Exceptions : all the nodes need to have the trait defined + Args : 1. Bio::Tree::TreeI object + 2. Bio::Tree::NodeI object within the tree, optional + +Depends on Fitch's parsimony score (PS). + +=cut + +sub _sum_of_leaf_distances { + my $self = shift; + my $tree = shift; + my $node = shift; + my $value = shift; + + my $key = 'ps_trait'; + + $self->throw ("ERROR: ". $node->internal_id. " needs a value for trait $key") + unless $node->has_tag($key); + return 0 if $node->get_tag_values($key) ne $value; + #return $node->branch_length if $node->is_Leaf; # end of recursion + return 0 if $node->is_Leaf; # end of recursion + + my $length = 0; + foreach my $child ($node->each_Descendent) { + $length += $self->_count_leaves($tree, $child, $value) * $child->branch_length + + $self->_sum_of_leaf_distances($tree, $child, $value); + } + return $length; +} + +sub sum_of_leaf_distances { + my $self = shift; + my $tree = shift; + my $node = shift || $tree->get_root_node; + + my $key = 'ps_trait'; + my $value = $node->get_tag_values($key); + + return $self->_sum_of_leaf_distances($tree, $node, $value); +} + =head2 genetic_diversity Example : genetic_diversity($tree, $node); - Description: Diversity is the sum of phylotype branch lengths - L normalised by number of leaf - nodes within the phylotype + Description: Diversity is the sum of root to leaf distances + within the phylotype normalised by number of leaf + nodes Returns : float, value of genetic diversity Exceptions : all the nodes need to have the trait defined Args : 1. Bio::Tree::TreeI object @@ -650,8 +656,6 @@ Depends on Fitch's parsimony score (PS). -DIVERSITY (T,A) = SUM(T,A)/COUNT-STRAIN(T,A). - =cut sub genetic_diversity { @@ -659,28 +663,28 @@ my $tree = shift; my $node = shift || $tree->get_root_node; - return $self->phylotype_length($tree, $node) / + return $self->sum_of_leaf_distances($tree, $node) / $self->count_leaves($tree, $node); } -=head2 separation +=head2 statratio - Example : separation($tree, $node); - Description: Ratio of the stem length and diversity of the + Example : statratio($tree, $node); + Description: Ratio of the stem length and the genetic diversity of the phylotype L Returns : float, separation score Exceptions : all the nodes need to have the trait defined Args : 1. Bio::Tree::TreeI object 2. Bio::Tree::NodeI object within the tree, optional +TStatratio gives a measure of separation and variability within the phylotype. +Larger values identify more rapidly evolving and recent phylotypes. + Depends on Fitch's parsimony score (PS). -SEPARATION (T,A) - STEM_LENGTH(T) / DIVERSITY(T) - =cut -sub separation { +sub statratio { my $self = shift; my $tree = shift; my $node = shift || $tree->get_root_node; @@ -692,9 +696,6 @@ } - - - =head2 ai Example : ai($tree, $key, $node); @@ -769,7 +770,7 @@ 3. Bio::Tree::NodeI object within the tree, optional -* Monophyletic Clade (MC) size statistics by Salemi at al 2005. It is +Monophyletic Clade (MC) size statistics by Salemi at al 2005. It is calculated for each trait value. 1<= MC <= nx, where nx is the number of tips with value x: Modified: bioperl-live/trunk/t/Tree/TreeStatistics.t =================================================================== --- bioperl-live/trunk/t/Tree/TreeStatistics.t 2009-05-06 17:20:34 UTC (rev 15665) +++ bioperl-live/trunk/t/Tree/TreeStatistics.t 2009-05-06 17:32:54 UTC (rev 15666) @@ -7,7 +7,7 @@ use lib '.'; use Bio::Root::Test; - test_begin(-tests => 35); + test_begin(-tests => 40); use_ok('Bio::TreeIO'); use_ok('Bio::Tree::Statistics'); @@ -99,12 +99,28 @@ is $stats->phylotype_length($tree, $node_i), 11, 'phylotype length'; +$node_i = $tree->find_node(-id => 'N4'); +is $stats->sum_of_leaf_distances($tree, $node_i), 1, 'sum of leaf distances'; -is sprintf ("%.3f", $stats->genetic_diversity($tree, $node_i)), 1.833, 'genetic diversity'; +$node_i = $tree->find_node(-id => 'N6'); +is $stats->sum_of_leaf_distances($tree, $node_i), 6, 'sum of leaf distances'; -is sprintf ("%.3f", $stats->separation($tree, $node_i)), 0.545, 'separation'; +$node_i = $tree->find_node(-id => 'N7'); +is $stats->sum_of_leaf_distances($tree, $node_i), 18, 'sum of leaf distances'; +$node_i = $tree->find_node(-id => 'N13'); +is $stats->sum_of_leaf_distances($tree, $node_i), 8, 'sum of leaf distances'; +$node_i = $tree->find_node(-id => 'N14'); +is $stats->sum_of_leaf_distances($tree, $node_i), 18, 'sum of leaf distances'; + + + +is sprintf ("%.3f", $stats->genetic_diversity($tree, $node_i)), '3.000', 'genetic diversity'; + +is sprintf ("%.3f", $stats->statratio($tree, $node_i)), '0.333', 'separation'; + + is $stats->ai($tree, $key), 0.628906, 'association index'; is $stats->ai($tree, $key, $node), 0.062500, 'subtree association index'; From bugzilla-daemon at portal.open-bio.org Wed May 6 13:38:31 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 6 May 2009 13:38:31 -0400 Subject: [Bioperl-guts-l] [Bug 2823] Bio::SeqIO::embl->next_seq corrupted with "Segmentation fault" when parsing million-line entries In-Reply-To: Message-ID: <200905061738.n46HcVO7001718@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2823 ------- Comment #6 from cjfields at bioperl.org 2009-05-06 13:38 EST ------- (In reply to comment #5) > I'll take a look; it may be incomplete integration of SeqBuilder into EMBL > parsing. Appears SeqBuilder is not integrated into Bio::SeqIO::embl at all (nor in many of the other SeqIO parsers). I'm unsure when this can be tackled. I have started rewriting the GenBank/EMBL/Swiss parsers to centralize data handling better, so it's probably best to do it there and deprecate the older parsers in favor of the newer ones. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bosborne at dev.open-bio.org Wed May 6 14:17:02 2009 From: bosborne at dev.open-bio.org (Brian Osborne) Date: Wed, 6 May 2009 14:17:02 -0400 Subject: [Bioperl-guts-l] [15667] bioperl-run/trunk/Bio/Tools/Run/Cap3.pm: Add y Message-ID: <200905061817.n46IH2iL010251@dev.open-bio.org> Revision: 15667 Author: bosborne Date: 2009-05-06 14:17:02 -0400 (Wed, 06 May 2009) Log Message: ----------- Add y Modified Paths: -------------- bioperl-run/trunk/Bio/Tools/Run/Cap3.pm Modified: bioperl-run/trunk/Bio/Tools/Run/Cap3.pm =================================================================== --- bioperl-run/trunk/Bio/Tools/Run/Cap3.pm 2009-05-06 17:32:54 UTC (rev 15666) +++ bioperl-run/trunk/Bio/Tools/Run/Cap3.pm 2009-05-06 18:17:02 UTC (rev 15667) @@ -10,7 +10,7 @@ =head1 SYNOPSIS # Build a Cap3 factory - my $factory = Bio::Tools::Run::Coil->new($params); + my $factory = Bio::Tools::Run::Cap3->new($params); # Pass the factory an input file name... my $result = $factory->run($filename); @@ -75,7 +75,7 @@ use Bio::Factory::ApplicationFactoryI; BEGIN { - @PARAMS = qw(a b c d e f g m n o p s u v x); + @PARAMS = qw(a b c d e f g m n o p s u v x y); $PROGRAMDIR = '/usr/local/bin'; # Authorize attribute fields From bosborne at dev.open-bio.org Wed May 6 16:14:11 2009 From: bosborne at dev.open-bio.org (Brian Osborne) Date: Wed, 6 May 2009 16:14:11 -0400 Subject: [Bioperl-guts-l] [15668] bioperl-run/trunk/Bio/Tools/Run/Cap3.pm: Patch applied Message-ID: <200905062014.n46KEBFL010445@dev.open-bio.org> Revision: 15668 Author: bosborne Date: 2009-05-06 16:14:11 -0400 (Wed, 06 May 2009) Log Message: ----------- Patch applied Modified Paths: -------------- bioperl-run/trunk/Bio/Tools/Run/Cap3.pm Modified: bioperl-run/trunk/Bio/Tools/Run/Cap3.pm =================================================================== --- bioperl-run/trunk/Bio/Tools/Run/Cap3.pm 2009-05-06 18:17:02 UTC (rev 15667) +++ bioperl-run/trunk/Bio/Tools/Run/Cap3.pm 2009-05-06 20:14:11 UTC (rev 15668) @@ -5,13 +5,17 @@ =head1 NAME -Bio::Tools::Run::Cap3 - wrapper for Cap3 +Bio::Tools::Run::Cap3 - wrapper for CAP3 =head1 SYNOPSIS - # Build a Cap3 factory - my $factory = Bio::Tools::Run::Cap3->new($params); + # Build a Cap3 factory with an (optional) parameter list + my @params = ('y', '150'); + my $factory = Bio::Tools::Run::Cap3->new(@params); + # Specify where CAP3 is installed, if not the default directory (/usr/local/bin): + $factory->program_dir('/opt/bio/bin'); + # Pass the factory an input file name... my $result = $factory->run($filename); @@ -20,7 +24,7 @@ =head1 DESCRIPTION -*** Describe the object here + Wrapper module for CAP3 program =head1 FEEDBACK @@ -75,7 +79,7 @@ use Bio::Factory::ApplicationFactoryI; BEGIN { - @PARAMS = qw(a b c d e f g m n o p s u v x y); + @PARAMS = qw(a b c d e f g h i j k m n o p r s t u v w x y z); $PROGRAMDIR = '/usr/local/bin'; # Authorize attribute fields @@ -91,6 +95,7 @@ # chained new my $self = $caller->SUPER::new(@args); + $self->{'_program_dir'} = $PROGRAMDIR; # to facilitiate tempfile cleanup my ( undef, $tempfile ) = $self->io->tempfile(); @@ -110,7 +115,7 @@ my $attr_letter = substr( $attr, 0, 1 ); # actual key is first letter of $attr unless first attribute - # letter is underscore (as in _READMETHOD), the $attr is a BLAST + # letter is underscore (as in _READMETHOD), the $attr is a CAP3 # parameter and should be truncated to its first letter only $attr = ( $attr_letter eq '_' ) ? $attr : $attr_letter; $self->throw("Unallowed parameter: $attr !") unless $OK_FIELD{$attr}; @@ -119,7 +124,12 @@ } sub program_dir { - $PROGRAMDIR; + my($self, $new_dir) = @_; + if (defined($self)) { + $self->{'_program_dir'} = $new_dir if (defined($new_dir)); + return $self->{'_program_dir'}; + } + return $PROGRAMDIR; } sub program_name { @@ -129,15 +139,15 @@ sub run { my ($self, $input) = @_; my $param_string = $self->_setparams; - my $exe = $self->executable; + my $exe = $self->executable(undef); + $self->throw("couldn't find executable for " . $self->program_name() . " in " . $self->program_dir()) if (!defined($exe)); # Create input file pointer my $infilename1 = $self->_setinput($input); if (! $infilename1) { $self->throw(" $input ($infilename1) not array of Bio::Seq objects or file name!"); } - my $commandstring = $exe . $param_string . " $infilename1"; - + my $commandstring = $exe . " $infilename1 " . $param_string; open(CAP3, "$commandstring |") || $self->throw(sprintf("%s call crashed: %s %s\n", $self->program_name, $!, $commandstring)); local $/ = undef; @@ -158,7 +168,7 @@ $value = $self->$attr(); next unless ( defined $value ); - # put params in format expected by cap3 + # put params in format expected by CAP3 $attr = '-' . $attr; $param_string .= " $attr $value "; } @@ -179,7 +189,7 @@ $infilename1 = (-e $input1) ? $input1 : 0 ; last SWITCH; } - # $input may be an array of BioSeq objects... + # $input may be an array of BioSeq objects... if (ref($input1) =~ /ARRAY/i ) { ($fh,$infilename1) = $self->io->tempfile(); $temp = Bio::SeqIO->new(-fh=> $fh, '-format' => 'Fasta'); From bugzilla-daemon at portal.open-bio.org Wed May 6 20:47:57 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 6 May 2009 20:47:57 -0400 Subject: [Bioperl-guts-l] [Bug 2823] Bio::SeqIO::embl->next_seq corrupted with "Segmentation fault" when parsing million-line entries In-Reply-To: Message-ID: <200905070047.n470lvf9030124@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2823 ------- Comment #7 from brianli.cas at gmail.com 2009-05-06 20:47 EST ------- > I'm unsure when this can be tackled. I have started rewriting the > GenBank/EMBL/Swiss parsers to centralize data handling better, so it's probably > best to do it there and deprecate the older parsers in favor of the newer ones. > Thanks. I will try the new ones when they are completed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From maj at dev.open-bio.org Fri May 8 08:08:57 2009 From: maj at dev.open-bio.org (Mark Allen Jensen) Date: Fri, 8 May 2009 08:08:57 -0400 Subject: [Bioperl-guts-l] [15669] bioperl-live/trunk/Bio/DB/EUtilities.pm: file and cb flipped in pod Message-ID: <200905081208.n48C8vaJ017046@dev.open-bio.org> Revision: 15669 Author: maj Date: 2009-05-08 08:08:56 -0400 (Fri, 08 May 2009) Log Message: ----------- file and cb flipped in pod Modified Paths: -------------- bioperl-live/trunk/Bio/DB/EUtilities.pm Modified: bioperl-live/trunk/Bio/DB/EUtilities.pm =================================================================== --- bioperl-live/trunk/Bio/DB/EUtilities.pm 2009-05-06 20:14:11 UTC (rev 15668) +++ bioperl-live/trunk/Bio/DB/EUtilities.pm 2009-05-08 12:08:56 UTC (rev 15669) @@ -131,8 +131,8 @@ These are passed on to LWP::UserAgent::request() if stipulated - -file - use a LWP::UserAgent-compliant callback - -cb - dumps the response to a file (handy for large responses) + -cb - use a LWP::UserAgent-compliant callback + -file - dumps the response to a file (handy for large responses) Note: can't use file and callback at the same time -read_size_hint - bytes of content to read in at a time to pass to callback Note : Caching and parameter checking are set From cjfields at dev.open-bio.org Mon May 11 11:36:01 2009 From: cjfields at dev.open-bio.org (Christopher John Fields) Date: Mon, 11 May 2009 11:36:01 -0400 Subject: [Bioperl-guts-l] [15670] bioperl-live/trunk/t/SearchIO/blasttable.t: add some SeqFeature-specific tests Message-ID: <200905111536.n4BFa0eD000780@dev.open-bio.org> Revision: 15670 Author: cjfields Date: 2009-05-11 11:35:59 -0400 (Mon, 11 May 2009) Log Message: ----------- add some SeqFeature-specific tests Modified Paths: -------------- bioperl-live/trunk/t/SearchIO/blasttable.t Modified: bioperl-live/trunk/t/SearchIO/blasttable.t =================================================================== --- bioperl-live/trunk/t/SearchIO/blasttable.t 2009-05-08 12:08:56 UTC (rev 15669) +++ bioperl-live/trunk/t/SearchIO/blasttable.t 2009-05-11 15:35:59 UTC (rev 15670) @@ -7,7 +7,7 @@ use lib '.'; use Bio::Root::Test; - test_begin(-tests => 154); + test_begin(-tests => 163); use_ok('Bio::SearchIO'); use_ok('Bio::Search::SearchUtils'); @@ -62,6 +62,7 @@ is($hit->name, 'gi|34395933|sp|P00561.2|AK1H_ECOLI'); $hit = $res->next_hit; my $hsp = $hit->next_hsp; + isa_ok($hsp, 'Bio::SeqFeatureI'); is($hsp->bits, 331); float_is($hsp->evalue, 2e-91); is($hsp->start('hit'), 16); @@ -70,4 +71,14 @@ is($hsp->end('query'), 812); is($hsp->length, 821); is($hsp->gaps, 14); + my $hit_sf = $hsp->hit; + my $query_sf = $hsp->query; + isa_ok($hit_sf, 'Bio::SeqFeatureI'); + is($hit_sf->start(), 16); + is($hit_sf->end(), 805); + is($hit_sf->strand(), 0); + isa_ok($query_sf, 'Bio::SeqFeatureI'); + is($query_sf->start(), 5); + is($query_sf->end(), 812); + is($query_sf->strand(), 0); } From cjfields at dev.open-bio.org Mon May 11 23:04:25 2009 From: cjfields at dev.open-bio.org (Christopher John Fields) Date: Mon, 11 May 2009 23:04:25 -0400 Subject: [Bioperl-guts-l] [15671] bioperl-live/trunk/Bio/SeqIO/fastq.pm: doc fix ( thanks to John Marshall from the mail list for pointing out) Message-ID: <200905120304.n4C34PBO002777@dev.open-bio.org> Revision: 15671 Author: cjfields Date: 2009-05-11 23:04:25 -0400 (Mon, 11 May 2009) Log Message: ----------- doc fix (thanks to John Marshall from the mail list for pointing out) Modified Paths: -------------- bioperl-live/trunk/Bio/SeqIO/fastq.pm Modified: bioperl-live/trunk/Bio/SeqIO/fastq.pm =================================================================== --- bioperl-live/trunk/Bio/SeqIO/fastq.pm 2009-05-11 15:35:59 UTC (rev 15670) +++ bioperl-live/trunk/Bio/SeqIO/fastq.pm 2009-05-12 03:04:25 UTC (rev 15671) @@ -37,9 +37,7 @@ Fastq files have sequence and quality data on a single line and the quality values are single-byte encoded. To retrieve the decimal values for qualities you need to subtract 33 (or Octal 41) from each byte and -then convert to a '2 digit + 1 space' integer. You can check if 33 is -the right number because the first byte which is always '!' -corresponds to a quality value of 0. +then convert to a '2 digit + 1 space' integer. =head1 FEEDBACK From cjfields at dev.open-bio.org Mon May 11 23:07:04 2009 From: cjfields at dev.open-bio.org (Christopher John Fields) Date: Mon, 11 May 2009 23:07:04 -0400 Subject: [Bioperl-guts-l] [15672] bioperl-live/trunk/Bio/SeqIO/fastq.pm: relax warning ( patch courtesy John Marshall) Message-ID: <200905120307.n4C3746Z002916@dev.open-bio.org> Revision: 15672 Author: cjfields Date: 2009-05-11 23:07:04 -0400 (Mon, 11 May 2009) Log Message: ----------- relax warning (patch courtesy John Marshall) Modified Paths: -------------- bioperl-live/trunk/Bio/SeqIO/fastq.pm Modified: bioperl-live/trunk/Bio/SeqIO/fastq.pm =================================================================== --- bioperl-live/trunk/Bio/SeqIO/fastq.pm 2009-05-12 03:04:25 UTC (rev 15671) +++ bioperl-live/trunk/Bio/SeqIO/fastq.pm 2009-05-12 03:07:04 UTC (rev 15672) @@ -128,7 +128,7 @@ $seqdata->{$type} = $line; } $self->warn("Seq/Qual descriptions don't match; using sequence description\n") - unless $seqdata->{seqdesc} eq $seqdata->{qualdesc}; + unless $seqdata->{qualdesc} eq '' || $seqdata->{seqdesc} eq $seqdata->{qualdesc}; my ($id,$fulldesc) = $seqdata->{seqdesc} =~ /^\s*(\S+)\s*(.*)/ or $self->throw("Can't parse fastq header"); if ($id eq '') {$id=$fulldesc;} # FIX incase no space between \@ and name From cjfields at dev.open-bio.org Mon May 11 23:44:50 2009 From: cjfields at dev.open-bio.org (Christopher John Fields) Date: Mon, 11 May 2009 23:44:50 -0400 Subject: [Bioperl-guts-l] [15673] bioperl-live/trunk/Build.PL: To allow spaces in files for MANIFEST, now require ExtUtils::Manifest 1.52 or above Message-ID: <200905120344.n4C3ioUX003424@dev.open-bio.org> Revision: 15673 Author: cjfields Date: 2009-05-11 23:44:50 -0400 (Mon, 11 May 2009) Log Message: ----------- To allow spaces in files for MANIFEST, now require ExtUtils::Manifest 1.52 or above Modified Paths: -------------- bioperl-live/trunk/Build.PL Modified: bioperl-live/trunk/Build.PL =================================================================== --- bioperl-live/trunk/Build.PL 2009-05-12 03:07:04 UTC (rev 15672) +++ bioperl-live/trunk/Build.PL 2009-05-12 03:44:50 UTC (rev 15673) @@ -30,7 +30,8 @@ 'IO::String' => 0, 'DB_File' => 0, 'Data::Stag' => 0.11, # Bio::SeqIO::swiss, we can change to 'recommend' if needed - 'Scalar::Util' => 0 # not in Perl 5.6.1, arrived in core in 5.7.3 + 'Scalar::Util' => 0, # not in Perl 5.6.1, arrived in core in 5.7.3 + 'ExtUtils::Manifest' => '1.52', # allows spaces in file names }, build_requires => { 'Test::More' => 0, @@ -72,7 +73,7 @@ 'XML::Parser::PerlSAX' => '0/parsing xml/Bio::SeqIO::tinyseq,Bio::SeqIO::game::gameSubs,Bio::OntologyIO::InterProParser,Bio::ClusterIO::dbsnp', 'XML::SAX' => '0.15/parsing xml/Bio::SearchIO::blastxml,Bio::SeqIO::tigrxml,Bio::SeqIO::bsml_sax', 'XML::SAX::Writer' => '0/writing xml/Bio::SeqIO::tigrxml', - 'XML::Simple' => '0/reading custom XML/Bio::Tools::EUtilities,Bio::DB::HIV,Bio::DB::Query::HIVQuery', + 'XML::Simple' => '0/reading custom XML/Bio::Tools::EUtilities,Bio::DB::HIV,Bio::DB::Query::HIVQuery', 'XML::Twig' => '0/parsing xml/Bio::Variation::IO::xml,Bio::DB::Taxonomy::entrez,Bio::DB::Biblio::eutils', 'XML::Writer' => '0.4/parsing and writing xml/Bio::SeqIO::agave,Bio::SeqIO::game::gameWriter,Bio::SeqIO::chadoxml,Bio::SeqIO::tinyseq,Bio::Variation::IO::xml,Bio::SearchIO::Writer::BSMLResultWriter', }, From cjfields at dev.open-bio.org Tue May 12 08:22:46 2009 From: cjfields at dev.open-bio.org (Christopher John Fields) Date: Tue, 12 May 2009 08:22:46 -0400 Subject: [Bioperl-guts-l] [15674] bioperl-live/trunk/Bio/SeqFeature/Generic.pm: remove_tag(), not remove_tags(). Message-ID: <200905121222.n4CCMkkF005856@dev.open-bio.org> Revision: 15674 Author: cjfields Date: 2009-05-12 08:22:45 -0400 (Tue, 12 May 2009) Log Message: ----------- remove_tag(), not remove_tags(). Fix courtesy Dan Bolser. Modified Paths: -------------- bioperl-live/trunk/Bio/SeqFeature/Generic.pm Modified: bioperl-live/trunk/Bio/SeqFeature/Generic.pm =================================================================== --- bioperl-live/trunk/Bio/SeqFeature/Generic.pm 2009-05-12 03:44:50 UTC (rev 15673) +++ bioperl-live/trunk/Bio/SeqFeature/Generic.pm 2009-05-12 12:22:45 UTC (rev 15674) @@ -390,7 +390,7 @@ } if ($self->has_tag('score')) { $self->warn("Removing score value(s)"); - $self->remove_tags('score'); + $self->remove_tag('score'); } $self->add_tag_value('score',$value); } From bugzilla-daemon at portal.open-bio.org Tue May 12 10:52:21 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 12 May 2009 10:52:21 -0400 Subject: [Bioperl-guts-l] [Bug 2713] [TODO] Update core Infernal parsing to v1.0, add related tests to bioperl-run In-Reply-To: Message-ID: <200905121452.n4CEqL9V027173@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2713 cjfields at bioperl.org changed: What |Removed |Added ---------------------------------------------------------------------------- Target Milestone|1.6.x point release |1.6.1 point release ------- Comment #1 from cjfields at bioperl.org 2009-05-12 10:52 EST ------- Working on this for the next release. Should be straightforward getting the prelim. parser running and updating the wrapper. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From maj at dev.open-bio.org Thu May 14 11:46:40 2009 From: maj at dev.open-bio.org (Mark Allen Jensen) Date: Thu, 14 May 2009 11:46:40 -0400 Subject: [Bioperl-guts-l] [15675] bioperl-live/trunk/Bio/Search/SearchUtils.pm: Warn if num/ conserved or num/identical are returned as Message-ID: <200905141546.n4EFkeB7014566@dev.open-bio.org> Revision: 15675 Author: maj Date: 2009-05-14 11:46:39 -0400 (Thu, 14 May 2009) Log Message: ----------- Warn if num/conserved or num/identical are returned as blank strings from HSP::matches(), set them to 0 and continue. See this thread: http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029935.html Modified Paths: -------------- bioperl-live/trunk/Bio/Search/SearchUtils.pm Modified: bioperl-live/trunk/Bio/Search/SearchUtils.pm =================================================================== --- bioperl-live/trunk/Bio/Search/SearchUtils.pm 2009-05-12 12:22:45 UTC (rev 15674) +++ bioperl-live/trunk/Bio/Search/SearchUtils.pm 2009-05-14 15:46:39 UTC (rev 15675) @@ -404,6 +404,14 @@ ($numID, $numCons) = $hsp->matches(-SEQ =>$seqType, -START => $start, -STOP => $_->{'start'} - 1); + if ($numID eq '') { + $hsp->warn("\$hsp->matches() returned '' for number identical; setting to 0"); + $numID = 0; + } + if ($numCons eq '') { + $hsp->warn("\$hsp->matches() returned '' for number conserved; setting to 0"); + $numCons = 0; + } }; if($@) { warn "\a\n$@\n"; } else { @@ -421,7 +429,15 @@ ($numID,$numCons) = $hsp->matches(-SEQ =>$seqType, -START => $_->{'stop'} + 1, -STOP => $stop); - }; + if ($numID eq '') { + $hsp->warn("\$hsp->matches() returned '' for number identical; setting to 0"); + $numID = 0; + } + if ($numCons eq '') { + $hsp->warn("\$hsp->matches() returned '' for number conserved; setting to 0"); + $numCons = 0; + } + }; if($@) { warn "\a\n$@\n"; } else { $_->{'stop'} = $stop; # Assign a new stop coordinate to the contig @@ -458,7 +474,15 @@ my ($these_ids, $these_cons); eval { ($these_ids, $these_cons) = $hsp->matches(-SEQ => $seqType, -START => $hsp_start, -STOP => $use_start - 1); - }; + if ($these_ids eq '') { + $hsp->warn("\$hsp->matches() returned '' for number identical; setting to 0"); + $these_ids = 0; + } + if ($these_cons eq '') { + $hsp->warn("\$hsp->matches() returned '' for number conserved; setting to 0"); + $these_cons = 0; + } + }; if($@) { warn "\a\n$@\n"; } else { $ids += $these_ids; @@ -487,6 +511,14 @@ my ($these_ids, $these_cons); eval { ($these_ids, $these_cons) = $hsp->matches(-SEQ => $seqType, -START => $use_stop + 1, -STOP => $hsp_end); + if ($these_ids eq '') { + $hsp->warn("\$hsp->matches() returned '' for number identical; setting to 0"); + $these_ids = 0; + } + if ($these_cons eq '') { + $hsp->warn("\$hsp->matches() returned '' for number conserved; setting to 0"); + $these_cons = 0; + } }; if($@) { warn "\a\n$@\n"; } else { @@ -522,6 +554,15 @@ elsif (! $overlap) { ## If there is no overlap, add the complete HSP data. ($numID,$numCons) = $hsp->matches(-SEQ=>$seqType); + if ($numID eq '') { + $hsp->warn("\$hsp->matches() returned '' for number identical; setting to 0"); + $numID = 0; + } + if ($numCons eq '') { + $hsp->warn("\$hsp->matches() returned '' for number conserved; setting to 0"); + $numCons = 0; + } + push @$contigs_ref, {'start' =>$start, 'stop' =>$stop, 'iden' =>$numID, 'cons' =>$numCons, 'strand'=>$strand,'frame'=>$frame,'hsps'=>[$hsp]}; From lstein at dev.open-bio.org Thu May 14 16:53:17 2009 From: lstein at dev.open-bio.org (Lincoln Stein) Date: Thu, 14 May 2009 16:53:17 -0400 Subject: [Bioperl-guts-l] [15676] bioperl-live/trunk/Bio/DB/SeqFeature/Store/berkeleydb.pm: fixed bug which prevented new databases from being created by autoindexing Message-ID: <200905142053.n4EKrH4b015198@dev.open-bio.org> Revision: 15676 Author: lstein Date: 2009-05-14 16:53:17 -0400 (Thu, 14 May 2009) Log Message: ----------- fixed bug which prevented new databases from being created by autoindexing Modified Paths: -------------- bioperl-live/trunk/Bio/DB/SeqFeature/Store/berkeleydb.pm Modified: bioperl-live/trunk/Bio/DB/SeqFeature/Store/berkeleydb.pm =================================================================== --- bioperl-live/trunk/Bio/DB/SeqFeature/Store/berkeleydb.pm 2009-05-14 15:46:39 UTC (rev 15675) +++ bioperl-live/trunk/Bio/DB/SeqFeature/Store/berkeleydb.pm 2009-05-14 20:53:17 UTC (rev 15676) @@ -369,9 +369,11 @@ my $autodir = shift; warn "Reindexing GFF files...\n" if $self->verbose; + my $exists = -e $self->_features_path; + $self->_permissions(1,1); $self->_close_databases(); - $self->_open_databases(1,0); + $self->_open_databases(1,!$exists); require Bio::DB::SeqFeature::Store::GFF3Loader unless Bio::DB::SeqFeature::Store::GFF3Loader->can('new'); my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store => $self, From jason at dev.open-bio.org Thu May 14 20:26:21 2009 From: jason at dev.open-bio.org (Jason Stajich) Date: Thu, 14 May 2009 20:26:21 -0400 Subject: [Bioperl-guts-l] [15677] bioperl-live/trunk/Bio/Tools/tRNAscanSE.pm: gene and exon are the proper SO even for tRNAs. Message-ID: <200905150026.n4F0QLNW015630@dev.open-bio.org> Revision: 15677 Author: jason Date: 2009-05-14 20:26:20 -0400 (Thu, 14 May 2009) Log Message: ----------- gene and exon are the proper SO even for tRNAs. Insure ID is unique across the genome Modified Paths: -------------- bioperl-live/trunk/Bio/Tools/tRNAscanSE.pm Modified: bioperl-live/trunk/Bio/Tools/tRNAscanSE.pm =================================================================== --- bioperl-live/trunk/Bio/Tools/tRNAscanSE.pm 2009-05-14 20:53:17 UTC (rev 15676) +++ bioperl-live/trunk/Bio/Tools/tRNAscanSE.pm 2009-05-15 00:26:20 UTC (rev 15677) @@ -87,7 +87,7 @@ use base qw(Bio::Tools::AnalysisResult); use vars qw($GeneTag $SrcTag $ExonTag); -($GeneTag,$SrcTag,$ExonTag) = qw(tRNA_gene tRNAscan-SE tRNA_exon); +($GeneTag,$SrcTag,$ExonTag) = qw(gene tRNAscan-SE exon); =head2 new @@ -263,8 +263,8 @@ if( $start > $end ) { ($start,$end,$strand) = ($end,$start,-1); } - if( $self->{'_seen'}->{"$seqid.$type"}++ ) { - $type .= "-".$self->{'_seen'}->{"$seqid.$type"}; + if( $self->{'_seen'}->{$type}++ ) { + $type .= "-".$self->{'_seen'}->{$type}; } my $gene = Bio::SeqFeature::Generic->new ( -seq_id => $seqid, @@ -276,6 +276,7 @@ -source_tag => $srctag, -tag => { 'ID' => "tRNA:$type", + 'Name' => "tRNA:$type", 'AminoAcid' => $type, 'Codon' => $codon, }); @@ -291,7 +292,7 @@ -primary_tag => $exontag, -source_tag => $srctag, -tag => { - 'Parent' => "tRNA:$type" + 'Parent' => "tRNA:$type", })); $gene->add_SeqFeature(Bio::SeqFeature::Generic->new ( -seq_id=> $seqid, From bugzilla-daemon at portal.open-bio.org Fri May 15 12:22:22 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 15 May 2009 12:22:22 -0400 Subject: [Bioperl-guts-l] [Bug 2828] Unrecognized line error, when parsing hmmsearch output, in module Bio::SearchIO::hmmer In-Reply-To: Message-ID: <200905151622.n4FGMMHt005014@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2828 fossandon at vtr.net changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |fossandon at vtr.net ------- Comment #1 from fossandon at vtr.net 2009-05-15 12:22 EST ------- This bug is similar to the older #2632, take a look: http://bugzilla.open-bio.org/show_bug.cgi?id=2632 I couldnt replicate your results because of a hmmsearch error: :~/Temp$ hmmsearch V-set_ls.hmm quick.txt FATAL: HMM file V-set_ls.hmm corrupt or in incorrect format? Parse failed I'm not sure what could be wrong with your format. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 15 12:59:19 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 15 May 2009 12:59:19 -0400 Subject: [Bioperl-guts-l] [Bug 2828] Unrecognized line error, when parsing hmmsearch output, in module Bio::SearchIO::hmmer In-Reply-To: Message-ID: <200905151659.n4FGxJoG007855@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2828 cjfields at bioperl.org changed: What |Removed |Added ---------------------------------------------------------------------------- Target Milestone|1.6.2 point release |1.6.x point release ------- Comment #2 from cjfields at bioperl.org 2009-05-15 12:59 EST ------- Please add any data as an attachment using the link above ('Create a New Attachment'). Copy/Paste of any data into the text box causes line wraps which bork replicating the error. If no attachment is added within the next month we'll have to assume user error and close the bug report. (bumping to a unspecified release due to above) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From cjfields at dev.open-bio.org Fri May 15 14:32:22 2009 From: cjfields at dev.open-bio.org (Christopher John Fields) Date: Fri, 15 May 2009 14:32:22 -0400 Subject: [Bioperl-guts-l] [15678] bioperl-live/trunk/t/ClusterIO/ClusterIO.t: requires XML::SAX Message-ID: <200905151832.n4FIWMG7018187@dev.open-bio.org> Revision: 15678 Author: cjfields Date: 2009-05-15 14:32:21 -0400 (Fri, 15 May 2009) Log Message: ----------- requires XML::SAX Modified Paths: -------------- bioperl-live/trunk/t/ClusterIO/ClusterIO.t Modified: bioperl-live/trunk/t/ClusterIO/ClusterIO.t =================================================================== --- bioperl-live/trunk/t/ClusterIO/ClusterIO.t 2009-05-15 00:26:20 UTC (rev 15677) +++ bioperl-live/trunk/t/ClusterIO/ClusterIO.t 2009-05-15 18:32:21 UTC (rev 15678) @@ -15,7 +15,7 @@ } SKIP: { - test_skip(-tests => 8, -requires_module => 'XML::Parser::PerlSAX'); + test_skip(-tests => 8, -requires_module => 'XML::SAX'); my ($clusterio, $result,$hit,$hsp); $clusterio = Bio::ClusterIO->new('-tempfile' => 0, From cjfields at dev.open-bio.org Fri May 15 16:37:03 2009 From: cjfields at dev.open-bio.org (Christopher John Fields) Date: Fri, 15 May 2009 16:37:03 -0400 Subject: [Bioperl-guts-l] [15679] bioperl-live/trunk/Bio/Seq/LargePrimarySeq.pm: stub fix for deleting tempdirs, just in case. Message-ID: <200905152037.n4FKb3fR018483@dev.open-bio.org> Revision: 15679 Author: cjfields Date: 2009-05-15 16:37:02 -0400 (Fri, 15 May 2009) Log Message: ----------- stub fix for deleting tempdirs, just in case. Modified Paths: -------------- bioperl-live/trunk/Bio/Seq/LargePrimarySeq.pm Modified: bioperl-live/trunk/Bio/Seq/LargePrimarySeq.pm =================================================================== --- bioperl-live/trunk/Bio/Seq/LargePrimarySeq.pm 2009-05-15 18:32:21 UTC (rev 15678) +++ bioperl-live/trunk/Bio/Seq/LargePrimarySeq.pm 2009-05-15 20:37:02 UTC (rev 15679) @@ -104,7 +104,7 @@ $self->_initialize_io(%params); my $tempdir = $self->tempdir( CLEANUP => 1); my ($tfh,$file) = $self->tempfile( DIR => $tempdir ); - + $self->{tempdir} = $tempdir; $tfh && $self->_fh($tfh); $file && $self->_filename($file); $self->length(0); @@ -307,8 +307,8 @@ my $fh = $self->_fh(); close($fh) if( defined $fh ); # this should be handled by Tempfile removal, but we'll unlink anyways. - unlink $self->_filename() - if defined $self->_filename() && -e $self->_filename; + unlink $self->_filename() if defined $self->_filename() && -e $self->_filename; + rmdir $self->{tempdir} if defined $self->{tempdir} && -e $self->{tempdir}; $self->SUPER::DESTROY(); } From cjfields at dev.open-bio.org Fri May 15 16:44:32 2009 From: cjfields at dev.open-bio.org (Christopher John Fields) Date: Fri, 15 May 2009 16:44:32 -0400 Subject: [Bioperl-guts-l] [15680] bioperl-live/trunk/Bio/Seq/LargePrimarySeq.pm: untabify and add comment Message-ID: <200905152044.n4FKiWkF018517@dev.open-bio.org> Revision: 15680 Author: cjfields Date: 2009-05-15 16:44:32 -0400 (Fri, 15 May 2009) Log Message: ----------- untabify and add comment Modified Paths: -------------- bioperl-live/trunk/Bio/Seq/LargePrimarySeq.pm Modified: bioperl-live/trunk/Bio/Seq/LargePrimarySeq.pm =================================================================== --- bioperl-live/trunk/Bio/Seq/LargePrimarySeq.pm 2009-05-15 20:37:02 UTC (rev 15679) +++ bioperl-live/trunk/Bio/Seq/LargePrimarySeq.pm 2009-05-15 20:44:32 UTC (rev 15680) @@ -97,14 +97,14 @@ my $seq = $params{'-seq'} || $params{'-SEQ'}; if($seq ) { - delete $params{'-seq'}; - delete $params{'-SEQ'}; + delete $params{'-seq'}; + delete $params{'-SEQ'}; } my $self = $class->SUPER::new(%params); $self->_initialize_io(%params); my $tempdir = $self->tempdir( CLEANUP => 1); my ($tfh,$file) = $self->tempfile( DIR => $tempdir ); - $self->{tempdir} = $tempdir; + $self->{tempdir} = $tempdir; $tfh && $self->_fh($tfh); $file && $self->_filename($file); $self->length(0); @@ -150,9 +150,9 @@ my ($self, $data) = @_; if( defined $data ) { if( $self->length() == 0) { - $self->add_sequence_as_string($data); + $self->add_sequence_as_string($data); } else { - $self->warn("Trying to reset the seq string, cannot do this with a LargePrimarySeq - must allocate a new object"); + $self->warn("Trying to reset the seq string, cannot do this with a LargePrimarySeq - must allocate a new object"); } } return $self->subseq(1,$self->length); @@ -178,48 +178,48 @@ if( ref($start) && $start->isa('Bio::LocationI') ) { my $loc = $start; if( $loc->length == 0 ) { - $self->warn("Expect location lengths to be > 0"); - return ''; + $self->warn("Expect location lengths to be > 0"); + return ''; } elsif( $loc->end < $loc->start ) { - # what about circular seqs - $self->warn("Expect location start to come before location end"); + # what about circular seqs + $self->warn("Expect location start to come before location end"); } my $seq = ''; if( $loc->isa('Bio::Location::SplitLocationI') ) { - foreach my $subloc ( $loc->sub_Location ) { - if(! seek($fh,$subloc->start() - 1,0)) { - $self->throw("Unable to seek on file $start:$end $!"); - } - my $ret = read($fh, $string, $subloc->length()); - if( !defined $ret ) { - $self->throw("Unable to read $start:$end $!"); - } - if( $subloc->strand < 0 ) { - $string = Bio::PrimarySeq->new(-seq => $string)->revcom()->seq(); - } - $seq .= $string; - } + foreach my $subloc ( $loc->sub_Location ) { + if(! seek($fh,$subloc->start() - 1,0)) { + $self->throw("Unable to seek on file $start:$end $!"); + } + my $ret = read($fh, $string, $subloc->length()); + if( !defined $ret ) { + $self->throw("Unable to read $start:$end $!"); + } + if( $subloc->strand < 0 ) { + $string = Bio::PrimarySeq->new(-seq => $string)->revcom()->seq(); + } + $seq .= $string; + } } else { - if(! seek($fh,$loc->start()-1,0)) { - $self->throw("Unable to seek on file ".$loc->start.":". - $loc->end ." $!"); - } - my $ret = read($fh, $string, $loc->length()); - if( !defined $ret ) { - $self->throw("Unable to read ".$loc->start.":". - $loc->end ." $!"); - } - $seq = $string; + if(! seek($fh,$loc->start()-1,0)) { + $self->throw("Unable to seek on file ".$loc->start.":". + $loc->end ." $!"); } + my $ret = read($fh, $string, $loc->length()); + if( !defined $ret ) { + $self->throw("Unable to read ".$loc->start.":". + $loc->end ." $!"); + } + $seq = $string; + } if( defined $loc->strand && - $loc->strand < 0 ) { - $seq = Bio::PrimarySeq->new(-seq => $seq)->revcom()->seq(); + $loc->strand < 0 ) { + $seq = Bio::PrimarySeq->new(-seq => $seq)->revcom()->seq(); } return $seq; } if( $start <= 0 || $end > $self->length ) { $self->throw("Attempting to get a subseq out of range $start:$end vs ". - $self->length); + $self->length); } if( $end < $start ) { $self->throw("Attempting to subseq with end ($end) less than start ($start). To revcom use the revcom function with trunc"); @@ -308,7 +308,8 @@ close($fh) if( defined $fh ); # this should be handled by Tempfile removal, but we'll unlink anyways. unlink $self->_filename() if defined $self->_filename() && -e $self->_filename; - rmdir $self->{tempdir} if defined $self->{tempdir} && -e $self->{tempdir}; + # remove tempdirs as well + rmdir $self->{tempdir} if defined $self->{tempdir} && -e $self->{tempdir}; $self->SUPER::DESTROY(); } From maj at dev.open-bio.org Sat May 16 01:30:37 2009 From: maj at dev.open-bio.org (Mark Allen Jensen) Date: Sat, 16 May 2009 01:30:37 -0400 Subject: [Bioperl-guts-l] [15681] bioperl-live/trunk/Bio/SearchIO/blasttable.pm: In characters(): set $self->{'_last_data'} to undef if $$data{Data} Message-ID: <200905160530.n4G5Ubd8019393@dev.open-bio.org> Revision: 15681 Author: maj Date: 2009-05-16 01:30:37 -0400 (Sat, 16 May 2009) Log Message: ----------- In characters(): set $self->{'_last_data'} to undef if $$data{Data} is a valid slot, whose value is undef: Allows an undef to be propagated to object constructors and handled there as desired; in particular, when Hsp_postive => -conserved is undef (in BLASTN, e.g.), the value of hsp's {CONSERVED} property is set to the value of {IDENTICAL} in B:S:HSP:GenericHSP::new(). (see thread at http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029941.html) Modified Paths: -------------- bioperl-live/trunk/Bio/SearchIO/blasttable.pm Modified: bioperl-live/trunk/Bio/SearchIO/blasttable.pm =================================================================== --- bioperl-live/trunk/Bio/SearchIO/blasttable.pm 2009-05-15 20:44:32 UTC (rev 15680) +++ bioperl-live/trunk/Bio/SearchIO/blasttable.pm 2009-05-16 05:30:37 UTC (rev 15681) @@ -404,7 +404,19 @@ sub characters{ my ($self,$data) = @_; - return unless ( defined $data->{'Data'} ); +# deep bug fix: set $self->{'_last_data'} to undef if $$data{Data} is +# a valid slot, whose value is undef -- +# allows an undef to be propagated to object constructors and +# handled there as desired; in particular, when Hsp_postive => -conserved +# is not defined (in BLASTN, e.g.), the value of hsp's {CONSERVED} property is +# set to the value of {IDENTICAL}. +#/maj +# return unless ( defined $data->{'Data'} ); + return unless ( grep /Data/, keys %$data ); + if ( !defined $data->{'Data'} ) { + $self->{'_last_data'} = undef; + return; + } if( $data->{'Data'} =~ /^\s+$/ ) { return unless $data->{'Name'} =~ /Hsp\_(midline|qseq|hseq)/; } From maj at dev.open-bio.org Sat May 16 01:40:13 2009 From: maj at dev.open-bio.org (Mark Allen Jensen) Date: Sat, 16 May 2009 01:40:13 -0400 Subject: [Bioperl-guts-l] [15682] bioperl-live/trunk/t/Seq/Quality.t: quieten scope mask warning Message-ID: <200905160540.n4G5eDHC019445@dev.open-bio.org> Revision: 15682 Author: maj Date: 2009-05-16 01:40:13 -0400 (Sat, 16 May 2009) Log Message: ----------- quieten scope mask warning Modified Paths: -------------- bioperl-live/trunk/t/Seq/Quality.t Modified: bioperl-live/trunk/t/Seq/Quality.t =================================================================== --- bioperl-live/trunk/t/Seq/Quality.t 2009-05-16 05:30:37 UTC (rev 15681) +++ bioperl-live/trunk/t/Seq/Quality.t 2009-05-16 05:40:13 UTC (rev 15682) @@ -264,5 +264,5 @@ my @ranges = $seq->get_all_clean_ranges; is scalar @ranges, 3; my $min_length = 10; -my @ranges = $seq->get_all_clean_ranges($min_length); + at ranges = $seq->get_all_clean_ranges($min_length); is scalar @ranges, 2; From jason at dev.open-bio.org Sun May 17 21:58:41 2009 From: jason at dev.open-bio.org (Jason Stajich) Date: Sun, 17 May 2009 21:58:41 -0400 Subject: [Bioperl-guts-l] [15683] bioperl-live/trunk/Bio/DB/SeqFeature/Store: requires the specific Loader instance to be 'use'ed Message-ID: <200905180158.n4I1wf7v005020@dev.open-bio.org> Revision: 15683 Author: jason Date: 2009-05-17 21:58:40 -0400 (Sun, 17 May 2009) Log Message: ----------- requires the specific Loader instance to be 'use'ed Modified Paths: -------------- bioperl-live/trunk/Bio/DB/SeqFeature/Store/GFF2Loader.pm bioperl-live/trunk/Bio/DB/SeqFeature/Store/GFF3Loader.pm Modified: bioperl-live/trunk/Bio/DB/SeqFeature/Store/GFF2Loader.pm =================================================================== --- bioperl-live/trunk/Bio/DB/SeqFeature/Store/GFF2Loader.pm 2009-05-16 05:40:13 UTC (rev 15682) +++ bioperl-live/trunk/Bio/DB/SeqFeature/Store/GFF2Loader.pm 2009-05-18 01:58:40 UTC (rev 15683) @@ -9,6 +9,7 @@ =head1 SYNOPSIS use Bio::DB::SeqFeature::Store; + use Bio::DB::SeqFeature::Store::GFF2Loader; # Open the sequence database my $db = Bio::DB::SeqFeature::Store->new( -adaptor => 'DBI::mysql', Modified: bioperl-live/trunk/Bio/DB/SeqFeature/Store/GFF3Loader.pm =================================================================== --- bioperl-live/trunk/Bio/DB/SeqFeature/Store/GFF3Loader.pm 2009-05-16 05:40:13 UTC (rev 15682) +++ bioperl-live/trunk/Bio/DB/SeqFeature/Store/GFF3Loader.pm 2009-05-18 01:58:40 UTC (rev 15683) @@ -9,6 +9,7 @@ =head1 SYNOPSIS use Bio::DB::SeqFeature::Store; + use Bio::DB::SeqFeature::Store::GFF3Loader; # Open the sequence database my $db = Bio::DB::SeqFeature::Store->new( -adaptor => 'DBI::mysql', From jason at dev.open-bio.org Sun May 17 21:59:48 2009 From: jason at dev.open-bio.org (Jason Stajich) Date: Sun, 17 May 2009 21:59:48 -0400 Subject: [Bioperl-guts-l] [15684] bioperl-live/branches/branch-1-6/Bio/DB/SeqFeature/Store: requires the specific Loader instance to be 'use'ed Message-ID: <200905180159.n4I1xm1F005052@dev.open-bio.org> Revision: 15684 Author: jason Date: 2009-05-17 21:59:48 -0400 (Sun, 17 May 2009) Log Message: ----------- requires the specific Loader instance to be 'use'ed Modified Paths: -------------- bioperl-live/branches/branch-1-6/Bio/DB/SeqFeature/Store/GFF2Loader.pm bioperl-live/branches/branch-1-6/Bio/DB/SeqFeature/Store/GFF3Loader.pm Modified: bioperl-live/branches/branch-1-6/Bio/DB/SeqFeature/Store/GFF2Loader.pm =================================================================== --- bioperl-live/branches/branch-1-6/Bio/DB/SeqFeature/Store/GFF2Loader.pm 2009-05-18 01:58:40 UTC (rev 15683) +++ bioperl-live/branches/branch-1-6/Bio/DB/SeqFeature/Store/GFF2Loader.pm 2009-05-18 01:59:48 UTC (rev 15684) @@ -9,6 +9,7 @@ =head1 SYNOPSIS use Bio::DB::SeqFeature::Store; + use Bio::DB::SeqFeature::Store::GFF2Loader; # Open the sequence database my $db = Bio::DB::SeqFeature::Store->new( -adaptor => 'DBI::mysql', Modified: bioperl-live/branches/branch-1-6/Bio/DB/SeqFeature/Store/GFF3Loader.pm =================================================================== --- bioperl-live/branches/branch-1-6/Bio/DB/SeqFeature/Store/GFF3Loader.pm 2009-05-18 01:58:40 UTC (rev 15683) +++ bioperl-live/branches/branch-1-6/Bio/DB/SeqFeature/Store/GFF3Loader.pm 2009-05-18 01:59:48 UTC (rev 15684) @@ -9,6 +9,7 @@ =head1 SYNOPSIS use Bio::DB::SeqFeature::Store; + use Bio::DB::SeqFeature::Store::GFF3Loader; # Open the sequence database my $db = Bio::DB::SeqFeature::Store->new( -adaptor => 'DBI::mysql', From bugzilla-daemon at portal.open-bio.org Fri May 15 10:13:16 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 15 May 2009 10:13:16 -0400 Subject: [Bioperl-guts-l] [Bug 2828] New: Unrecognized line error, when parsing hmmsearch output, in module Bio::SearchIO::hmmer Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2828 Summary: Unrecognized line error, when parsing hmmsearch output, in module Bio::SearchIO::hmmer Product: BioPerl Version: 1.6 branch Platform: PC OS/Version: Linux Status: NEW Keywords: Bioperl Severity: normal Priority: P2 Component: bioperl-run AssignedTo: bioperl-guts-l at bioperl.org ReportedBy: thomas.gallagher at nottingham.ac.uk I think I have found a bug in the Bio::SearchIO::hmmer.pm module. When running a search on protein sequences using a hmmer file downloaded/produced by pfam (http://pfam.sanger.ac.uk/family?acc=PF07686), using the following sequence/script, the following error is produced: input hmm file (./pfam_source/V-set_ls.hmm): HMMER2.0 [2.3.2] NAME V-set ACC PF07686.9 DESC Immunoglobulin V-set domain LENG 142 ALPH Amino RF no CS yes MAP yes COM hmmbuild -F HMM_ls.ann SEED.ann COM hmmcalibrate --seed 0 HMM_ls.ann NSEQ 118 DATE Thu Apr 24 22:35:23 2008 CKSUM 3450 GA 12.0000 0.0000; TC 12.0000 0.0000; NC 11.9000 11.9000; XT -8455 -4 -1000 -1000 -8455 -4 -8455 -4 NULT -4 -8455 NULE 595 -1558 85 338 -294 453 -1158 197 249 902 -1085 -142 -21 -313 45 531 201 384 -1998 -644 EVD -51.945137 0.229520 HMM A C D E F G H I K L M N P Q R S T V W Y m->m m->i m->d i->m i->i d->m d->d b->m m->e -13 * -6845 1 911 1071 688 207 -1653 -406 -3643 -5551 -1958 -2953 1734 -85 332 1915 -1146 996 -305 -2709 -5664 -1686 1 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 X -14 -12000 -6717 -894 -1115 -701 -1378 -13 * 2 -255 -4926 -967 2 430 -1465 -3890 898 -2132 69 -399 -726 -2390 934 -2239 1541 -255 833 -266 -777 2 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -46 -11999 -5001 -894 -1115 -1492 -634 * * 3 -1820 -694 75 595 28 -936 -309 -2662 -219 -149 -1236 -235 504 1361 -819 406 759 666 1177 -4931 3 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -11953 -12995 -894 -1115 -140 -3431 * * 4 -674 -4140 -1511 -6014 -435 -5859 -1348 649 -2800 1140 1392 -2446 -794 -5236 -5414 -1519 -2165 2775 -4598 -4256 4 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -12012 -13054 -894 -1115 -701 -1378 * * 5 -2653 446 -2229 329 -116 -1816 905 -506 -20 -482 -4544 -624 -1290 1651 -515 257 1752 -495 1127 418 5 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -12012 -13054 -894 -1115 -701 -1378 * * 6 189 -5493 -845 90 -5814 353 -1192 -2630 -125 -1396 -4582 -1766 39 2941 -3742 -498 1120 -19 -5677 -4994 6 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -2725 -12012 -237 -894 -1115 -701 -1378 * * 7 -336 1040 -1521 1553 110 -288 1778 -3087 331 705 -2156 -1279 -289 850 17 -1542 61 -2668 -3258 -2592 7 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -214 -9291 -2876 -894 -1115 -791 -1244 * * 8 -802 -4759 120 -1451 -1591 -4375 524 -2137 -2636 -394 -3859 -42 2123 1140 -1317 1469 1026 -1092 138 -4309 8 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 C -49 -11324 -4920 -894 -1115 -435 -1942 * * 9 -1235 -944 370 316 -5644 -503 826 -5395 -357 -1717 -4412 -902 2813 69 -1364 1065 20 -4945 -5506 -4824 9 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 S -1 -11832 -12874 -894 -1115 -2830 -219 * * 10 296 -900 -248 51 -5659 200 798 -5409 613 -1982 -1234 -3476 882 -829 1875 623 284 -808 -5522 -812 10 - -152 -485 230 42 -379 402 107 -621 207 -469 -724 272 403 50 95 356 114 -365 -287 -253 S -7013 -4181 -94 -1979 -422 -3933 -98 * * 11 -5464 -4325 -5279 -5610 -3217 -4427 -4003 -5775 -5511 -5162 -5196 -5301 -4839 -5345 -5056 -5766 -5616 -5733 6271 -2829 15 - -149 -500 232 42 -379 399 105 -627 210 -467 -721 275 393 56 97 358 117 -369 -295 -250 . -3025 -193 -8903 -33 -5474 -5931 -24 * * 12 -2964 -3332 -3729 -3758 -5044 3579 -3484 -5187 -2842 -5064 -4437 -3488 -3954 -3358 1152 -3146 -3281 -4414 -4449 -4673 17 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 . -8 -8049 -9091 -894 -1115 -5913 -24 * * 13 -3778 -3191 -6142 -5609 -1428 -5902 -4658 1186 -5381 3059 -213 -5653 -5039 -4240 -4912 -5307 -3644 -1336 -3234 -3380 18 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 . -7 -8216 -9259 -894 -1115 -5303 -37 * * 14 -4345 -3689 -6326 -5964 2682 -6082 -3755 -1352 -5713 2811 -626 -5545 -5322 -4450 -5150 -5515 -4188 -2191 -2684 -2026 19 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 . -6 -8506 -9548 -894 -1115 -6882 -12 * * 15 -4162 -3543 -6506 -5928 1249 -6292 -4760 806 -5718 2990 -393 -5978 -5306 -4445 -5176 -5661 -3995 -1871 -3339 -3395 20 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 . -6 -8506 -9548 -894 -1115 -6882 -12 * * 16 -2626 -2838 -4807 -4759 -2535 -3631 -3972 -1860 -4319 1630 -1533 -4003 -4122 -4043 -4146 -3055 3415 -2141 -3742 -3528 21 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 . -6 -8506 -9548 -894 -1115 -6882 -12 * * 17 3018 -2361 -4791 -4812 -3375 -3247 -4184 2270 -4573 -2526 -2343 -3763 -3866 -4280 -4428 -2625 -2445 -1010 -4358 -4036 22 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 . -6 -8506 -9548 -894 -1115 -6882 -12 * * 18 -2520 -2910 -4542 -4797 2763 -3317 -2746 -4077 -4728 -3842 -3563 -3604 -3973 -4135 -4462 3141 -2917 -3616 -2264 -1168 23 - -150 -501 232 42 -374 397 104 -628 209 -455 -722 277 392 44 95 358 116 -368 -296 -251 . -3670 -120 -9548 -20 -6166 -6882 -12 * * 19 -3943 -3384 -6274 -5721 2083 -5960 -4422 -1027 -5483 2803 -439 -5614 -5174 -4350 -5008 -5273 -3802 629 -3199 -3072 25 - -147 -501 232 42 -382 397 104 -628 209 -467 -722 274 395 44 95 361 127 -371 -296 -251 . -3670 -120 -9548 -20 -6166 -6882 -12 * * 20 -1658 2145 -3847 -3278 2761 -3142 -1696 786 -2891 -1290 -691 -2710 -3197 -2479 -2700 1936 -1605 -900 -1395 1277 27 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 . -6 -8506 -9548 -894 -1115 -6882 -12 * * 21 -3404 -3976 -3757 -2699 -3815 -3855 -1795 -4147 1791 -3803 -3195 -2584 -3843 1243 -476 -3291 -3106 -3949 5535 -2947 28 - -150 -501 232 42 -382 402 109 -628 209 -467 -722 280 392 44 95 363 116 -371 -296 -251 . -3670 -120 -9548 -20 -6166 -6882 -12 * * 22 2575 -2507 -4409 -4727 -4993 -2753 -4231 -4845 -4741 -5086 -4218 -3447 3282 -4308 -4505 -2177 -2394 -3666 -5068 -5095 30 - -147 -501 232 42 -382 397 104 -628 209 -466 -722 274 402 44 95 358 118 -365 -296 -251 . -3670 -120 -9548 -20 -6166 -5489 -32 * * 23 371 -2349 -4504 -4527 -4838 956 -3998 -4627 -4338 -4864 -3924 -3261 699 -3953 -4249 1242 3062 -3456 -5044 -4906 32 - -129 -495 233 50 -384 387 93 -599 205 -466 -728 271 383 43 87 350 140 -357 -297 -262 . -3865 -104 -9743 -3070 -183 -52 -4826 * * 24 -1031 -748 -804 695 284 -1383 205 -2429 -3619 39 -1365 -541 1408 -940 -731 1273 -556 421 1962 94 42 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -55 -11972 -4738 -894 -1115 -2351 -315 * * 25 -636 -4075 -6332 -854 -678 -530 -4593 1161 -1005 1230 -272 -5304 -989 -1079 -1196 -1456 -201 2139 -4529 -1332 43 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -11917 -12960 -894 -1115 -296 -2431 * * 26 -451 -5369 -1292 -49 445 -5016 105 64 -1906 -314 -78 -130 -2221 -882 329 1175 1849 566 -5583 -4932 44 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -11999 -13041 -894 -1115 -364 -2165 * * 27 1559 -4172 -6376 -1780 -1838 509 -1193 -595 732 -326 -3373 -5372 -5871 -5084 -2242 -2773 -337 2394 -241 -4276 45 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -12012 -13054 -894 -1115 -701 -1378 * * 28 999 -5485 -2216 172 -2332 -600 -992 -1602 997 -561 -1108 178 142 1434 -2049 293 -679 877 -5671 -793 46 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -12012 -13054 -894 -1115 -701 -1378 * * 29 -1309 -5473 -1373 1451 -2095 -5001 -688 -1599 -1359 319 -1468 -374 1283 479 826 -931 679 1196 -5663 -4986 47 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 T -1 -12012 -13054 -894 -1115 -701 -1378 * * 30 -3010 -8495 -566 -2725 -8607 3590 -1049 -8592 -1820 -8423 -7842 -90 -6776 -2087 -7003 -910 -6526 -8033 -8626 -7500 48 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 T -1 -12012 -13054 -894 -1115 -701 -1378 * * 31 190 -5494 797 1173 -5815 1459 -938 -667 390 -2270 -4583 -61 -5088 1748 -958 4 101 -5116 -5677 -4995 49 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 S -1 -12012 -13054 -894 -1115 -701 -1378 * * 32 -1303 -769 408 -812 -5815 -4995 -934 -1702 185 -5510 26 1745 -825 -1039 63 1899 1664 -1619 -5677 -4994 50 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -13 -12012 -6845 -894 -1115 -701 -1378 * * 33 1453 -4135 -6655 -6019 -1858 -2929 -4730 388 -2599 645 -244 -5504 -5907 -5238 -5415 -2036 -1034 2839 -4595 -4252 51 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -12000 -13042 -894 -1115 -377 -2121 * * 34 -2651 -5489 -600 321 -5807 -2782 240 23 815 22 -603 -105 -5089 -926 641 450 2251 -811 -5673 -1916 52 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -12012 -13054 -894 -1115 -701 -1378 * * 35 -7653 -6988 -10041 -9448 311 -9931 -8507 1685 -9280 2816 1303 -9675 -8768 -7898 -8689 -9374 -7473 -477 -6860 -7144 53 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -12012 -13054 -894 -1115 -701 -1378 * * 36 -4021 -735 -921 3 -2316 -1763 1398 -2430 299 -1298 -4583 383 1793 639 777 949 1360 -3342 -5677 -1822 54 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -12012 -13054 -894 -1115 -701 -1378 * * 37 -1658 5723 -2714 -8312 -8184 -5912 -7567 -7830 -8239 -3453 -7306 -1784 -6720 -7658 -7906 -2865 -5494 -887 -8446 -8376 55 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -24 -12012 -5921 -894 -1115 -701 -1378 * * 38 -1018 -1191 -1364 -129 1489 -5037 1932 -651 636 -1063 -4380 -3714 -806 499 -251 760 1547 -546 -309 -458 56 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -14 -11988 -6757 -894 -1115 -1894 -452 * * 39 -457 86 -6289 -2814 1810 -1348 -1243 -1424 -2484 562 -1083 240 95 191 -2563 -945 -466 749 -614 2638 57 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -11975 -13017 -894 -1115 -567 -1621 * * 40 -874 -740 -1214 -29 -5798 -1462 1297 -16 405 -1237 786 -524 -916 183 330 1794 836 -1194 430 -4982 58 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -3312 -12001 -153 -894 -1115 -1368 -707 * * 41 -1975 -2633 -1974 633 -3485 3048 -2453 834 -2457 -3295 -2599 -2123 -3266 -2234 -2827 -2112 836 -2527 -3782 -3301 59 - -150 -501 233 42 -374 398 111 -628 209 -465 -710 274 392 48 94 358 116 -371 -296 -237 C -3858 -105 -9736 -18 -6365 -5036 -45 * * 42 -81 -2866 296 -704 -3183 -2378 1547 -2931 765 -2880 -1956 683 470 1076 -1125 -11 1388 175 -3051 1025 61 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 H -4 -8956 -9998 -894 -1115 -5661 -29 * * 43 -1813 -1633 -4120 -3502 2559 -3347 -2108 1107 -3099 -1456 1584 -2967 436 -2709 -2897 -2433 468 -1050 2454 2634 62 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 H -4 -9069 -10111 -894 -1115 -1233 -800 * * 44 -1415 -297 1788 762 -4876 469 -2716 -4627 290 -1295 -202 1361 -4151 -829 -1263 1024 567 -4177 589 -934 63 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 S -28 -11006 -5732 -894 -1115 -2240 -343 * * 45 -3307 -169 1854 -368 -5101 1086 -563 -4852 -165 29 -3869 716 331 -957 -50 1099 -34 -4402 -4963 -4280 64 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 S -1 -11250 -12292 -894 -1115 -1972 -425 * * 46 -342 -4739 -413 -1728 -788 -4570 1830 -380 -385 -861 -44 -169 137 -2826 105 1756 344 -1243 -4989 1787 65 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 S -940 -11476 -1063 -894 -1115 -3327 -151 * * 47 -1378 1846 -168 303 -633 -566 349 -470 -873 -1710 -29 960 -319 -354 -2573 1256 1177 -649 934 -508 66 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 G -93 -10632 -4010 -894 -1115 -2758 -231 * * 48 -1398 158 -1209 750 -1217 -3973 -2642 -1498 -214 -709 -3325 1368 894 90 -2753 1550 671 -540 423 431 67 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 S -170 -10846 -3173 -894 -1115 -1994 -417 * * 49 -437 1421 614 -1301 -3946 -1826 -3173 -1524 -1359 -1140 -1076 1191 1337 -2919 -3401 1386 -21 1028 1563 -895 68 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 S -1 -11099 -12141 -894 -1115 -4280 -76 * * 50 -3210 -4675 40 1301 -4993 171 -267 -1901 387 -248 -3765 678 -4278 542 -2932 1145 810 -76 231 44 69 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 T -1 -11144 -12186 -894 -1115 -370 -2145 * * 51 44 -638 216 -287 -2172 1020 1109 -899 127 -3138 264 873 -1 719 322 470 505 -2609 -5532 -1473 70 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 T -1 -11860 -12902 -894 -1115 -3849 -104 * * 52 -819 -5333 816 49 -1240 -1281 -56 242 -491 -1132 967 865 594 -243 -1180 414 960 594 -5521 -1444 71 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 C -464 -11860 -1864 -894 -1115 -3849 -104 * * 53 -1177 -4898 -198 -930 1389 640 1252 -2113 -1569 -36 -3990 546 -1575 1383 -1336 209 451 -621 -5086 1338 72 - -149 -500 233 43 -381 398 105 -627 210 -464 -721 278 393 45 96 359 117 -370 -295 -250 E -47 -4971 -12439 -74 -4315 -298 -2420 * * 54 -118 -5400 387 54 -212 -1239 240 -2580 -172 -2050 655 80 -898 322 529 801 340 -435 517 1890 74 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -11919 -12961 -894 -1115 -904 -1103 * * 55 -1961 -577 -6617 -5980 1313 -5818 -4690 2087 -50 781 1936 -5464 -2454 -5198 -5375 -4903 -2361 1820 -705 606 75 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -11968 -13010 -894 -1115 -2449 -292 * * 56 -835 -679 -1275 -378 233 -2976 2099 -48 -495 -1778 -4512 467 -5056 461 24 890 203 -2553 -657 2694 76 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -11968 -13010 -894 -1115 -2449 -292 * * 57 -9854 -8450 -9563 -9942 -1975 -8799 -7078 -9442 -9903 -8731 -8817 -9090 -9099 -9194 -9291 -9817 -9844 -9510 6301 -5344 77 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -11968 -13010 -894 -1115 -2449 -292 * * 58 -2453 1454 -1338 -1434 1813 -2560 -802 -23 -5286 -546 281 -708 -5811 -1764 -2464 181 -899 227 -4595 3400 78 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -11968 -13010 -894 -1115 -2449 -292 * * 59 -4239 -4480 -5153 -2611 387 -5538 -1272 -1377 2337 -928 -3664 -4622 -5605 1187 2807 -2680 -4175 -1135 -426 219 79 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -3150 -11968 -173 -894 -1115 -2449 -292 * * 60 -1618 -3039 1552 -794 -3330 911 -1235 -3069 -891 600 -2162 841 -2631 2023 -1414 -1494 1269 -2650 -3253 -2566 80 - -150 -501 231 41 -382 399 111 -623 211 -468 -722 274 399 48 97 359 116 -368 -296 -251 C -3987 -96 -9865 -16 -6499 -81 -4198 * * 61 -2363 -658 -735 -1442 -1183 561 1272 -2425 -223 -910 -296 -1788 -814 3305 113 -1125 -2268 -410 -5590 -260 82 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -11947 -12989 -894 -1115 -804 -1227 * * 62 -164 -5467 126 -1530 -5788 -140 -374 -1522 776 -768 -4556 260 1582 846 949 646 637 -5089 -5650 -955 83 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -11984 -13026 -894 -1115 -2039 -402 * * 63 132 -854 -109 -303 -669 -1141 -247 -1641 -1010 529 -429 -1023 1726 107 -454 669 835 -1250 -5621 -660 84 - -154 -453 233 41 -380 401 101 -627 208 -466 -725 273 391 52 97 360 117 -371 -299 -242 T -293 -4332 -2901 -2599 -260 -238 -2716 * * 64 68 -685 88 -1934 -5614 2837 39 -2468 -1215 -3299 -183 1191 -2275 -1329 -1142 -989 -328 -1606 -5481 -4800 93 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 T -1 -11809 -12851 -894 -1115 -4198 -81 * * 65 -352 -5302 -486 1092 -5623 667 1154 -5374 1324 -2117 -4391 574 599 1416 -182 153 -347 -2254 -5485 -4802 94 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 S -16 -11809 -6513 -894 -1115 -4198 -81 * * 66 222 379 -1115 -405 -1726 1312 -3448 -1905 1481 -1604 -4370 -1042 832 862 -1134 281 311 -969 -5465 -596 95 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -11794 -12836 -894 -1115 -4290 -76 * * 67 -1577 -670 139 -319 -1202 -1563 896 -2283 -162 321 -162 309 2737 -799 -210 5 -889 -4903 -5467 -1490 96 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -11794 -12836 -894 -1115 -4290 -76 * * 68 -1163 -527 5 1673 -830 -4791 16 -1586 798 -856 -1359 -1105 -1422 1899 586 -1150 595 -64 -5462 -730 97 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -11794 -12836 -894 -1115 -4290 -76 * * 69 -1579 -4850 -1221 1063 1288 -1235 -24 -902 -2033 1424 -1004 -1622 -5030 521 -134 -1261 -22 -53 2253 -60 98 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -11794 -12836 -894 -1115 -4290 -76 * * 70 -1436 -781 -6448 -5812 -928 -2431 192 2274 -2547 1202 1898 -5296 -2350 -979 -5207 -844 -484 1414 -295 -1349 99 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -13 -11794 -6879 -894 -1115 -4290 -76 * * 71 1009 -814 -2079 -5796 1319 -1391 844 1841 -2210 -36 708 -5283 -2079 -5016 -5194 190 -1259 1039 -4376 707 100 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -4243 -11782 -79 -894 -1115 -3164 -171 * * 72 -1172 -2508 1579 -257 -2759 -1925 -752 2066 1375 -2490 -1651 1632 -2134 -344 -928 -1049 -1130 -2073 -2730 -2061 101 - -149 -500 232 42 -381 398 105 -622 212 -464 -721 275 396 45 95 360 117 -370 -295 -250 . -3004 -196 -8882 -33 -5450 -1462 -651 * * 73 -548 -4256 -2632 26 -4577 -1546 75 -4327 907 -1186 -3346 842 -1009 -516 815 1764 -285 -1429 -4440 1980 103 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -10673 -11715 -894 -1115 -5077 -43 * * 74 -1332 1066 -5276 -1311 1894 -4609 -3472 121 -4277 101 -2129 -793 -4660 57 279 771 -3018 1213 -3382 2365 104 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -10710 -11753 -894 -1115 -4208 -80 * * 75 -636 -3068 -4921 -4318 486 -4614 -100 -331 -80 790 -2267 306 -4671 71 -1413 -218 -1012 45 2223 3020 105 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 S -1 -10797 -11839 -894 -1115 -3088 -181 * * 76 -604 -4499 -305 5 -392 -372 -2712 979 -118 -834 -3593 166 -445 -747 -1197 1075 1295 119 -4693 161 106 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 S -1 -10987 -12029 -894 -1115 -1438 -664 * * 77 -209 1122 -984 -1455 1129 132 2237 -671 -860 -494 -3922 472 -4599 457 660 -1373 -573 -23 -5039 1764 107 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 C -1 -11447 -12489 -894 -1115 -1324 -736 * * 78 -1598 -5182 636 -214 184 38 551 -1630 384 85 -4272 310 -1249 -112 -498 -139 -1183 -106 1588 2248 108 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 T -33 -11695 -5485 -894 -1115 -1676 -542 * * 79 -192 -5240 672 -167 529 -2488 -1124 -129 483 -1103 -4332 656 -686 -158 -1956 634 466 607 -560 1332 109 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -11765 -12807 -894 -1115 -1611 -572 * * 80 17 -5225 -3774 -873 26 -834 1302 -135 -54 -1158 -4327 54 961 753 364 -50 933 -387 -659 1369 110 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -15 -11845 -6625 -894 -1115 -2742 -234 * * 81 -325 -1005 270 560 -5657 291 -3497 -1053 -325 -5352 -1108 1340 -60 798 842 774 -834 504 -590 -815 111 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -439 -11847 -1931 -894 -1115 -1865 -463 * * 82 -525 -4977 167 853 927 14 -766 -2308 741 -2666 -4066 897 -375 750 -748 370 769 -1962 -5160 838 112 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -17 -11463 -6425 -894 -1115 -2602 -260 * * 83 -511 -271 237 -1614 -5368 1261 -58 -721 444 -1308 -4138 532 577 1150 635 -1319 732 -399 -5232 -305 113 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -33 -11543 -5489 -894 -1115 -3444 -139 * * 84 -758 -5055 274 -563 -225 222 1431 -414 -143 -5071 229 1364 -198 204 897 153 157 -936 -5239 1017 114 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 T -1 -11548 -12590 -894 -1115 -2497 -281 * * 85 -361 -5124 436 -1198 -859 1178 50 191 178 -490 -753 -173 390 600 -514 665 204 -1052 -5310 -1270 115 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 T -1 -11632 -12674 -894 -1115 -1071 -932 * * 86 -2509 -5301 117 -418 -1380 493 718 -45 422 -772 -416 584 -1415 -215 1038 -146 885 -1532 552 1422 116 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 T -13 -11822 -6880 -894 -1115 -2993 -194 * * 87 -3858 -5205 -1805 -242 1411 -777 -841 227 154 -1523 -381 -373 534 -232 -1269 -16 454 1157 -5419 1942 117 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 C -1 -11826 -12868 -894 -1115 -1223 -807 * * 88 -1361 -5362 -731 490 -143 630 718 -996 411 -2993 653 -624 876 748 -3652 517 853 -778 -389 1065 118 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -11905 -12947 -894 -1115 -3430 -140 * * 89 -725 -5391 563 1618 -436 876 293 -1721 -500 -1595 -4480 -23 1583 -70 -1230 184 -809 -1777 -490 -1459 119 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -11905 -12947 -894 -1115 -1349 -719 * * 90 -580 -5429 -155 1268 -1092 -1241 -1213 -2535 -86 -903 -4518 29 108 403 1731 887 -278 73 -5612 -4929 120 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -11944 -12986 -894 -1115 -1119 -890 * * 91 -4123 1119 -2161 -570 3374 -5328 353 -1721 47 45 -450 -1948 -2463 -1619 -2231 -1276 -2106 -1755 -4998 2240 121 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -25 -11973 -5870 -894 -1115 -687 -1401 * * 92 -1649 -5457 -127 -353 -5778 -1345 -243 -5529 2118 -2939 -314 260 74 1206 328 1490 -617 -894 -5640 -4957 122 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -25 -11973 -5871 -894 -1115 -694 -1388 * * 93 -189 -5457 1080 599 -5778 2288 -3616 -5529 667 -2990 -4546 1592 -2137 -1462 -1323 -997 -1370 -1808 608 -4957 123 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 T -13 -11973 -6872 -894 -1115 -695 -1388 * * 94 -1063 -5469 -835 -359 -5790 -1743 -1257 -5541 587 -3322 -1469 1131 -1398 -3168 3006 992 110 -5091 -5652 -4969 124 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 T -1 -11985 -13028 -894 -1115 -1987 -419 * * 95 -747 114 -1229 -11 1114 -1289 -356 496 -36 -1145 -4274 -613 -222 -3353 903 -779 287 1571 -5416 821 125 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -88 -11985 -4089 -894 -1115 -955 -1046 * * 96 -812 -5396 -428 711 377 -1745 1315 -847 -513 -1321 -4485 -1175 -1315 54 1400 946 1387 -581 -5580 -574 126 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -183 -11911 -3072 -894 -1115 -1205 -821 * * 97 -248 -3914 -834 -1693 2262 -5629 -4499 1155 -5370 1253 375 -1926 -2206 -1692 -5179 -288 -1919 798 2204 -424 127 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -11775 -12817 -894 -1115 -3190 -167 * * 98 377 -5236 -1209 -455 615 -1046 -876 -699 -267 -465 -4 -67 -572 139 -1155 1394 1119 -1083 -104 578 128 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -11792 -12835 -894 -1115 -4296 -75 * * 99 303 -5283 -676 -836 -5603 952 687 -859 -149 -1296 671 326 79 953 370 178 151 596 -5467 -1710 129 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -18 -11792 -6344 -894 -1115 -4296 -75 * * 100 -2376 145 1991 -831 -919 -4770 897 -5340 -715 -2733 -1245 2393 -700 62 -528 1000 648 -2578 -5452 -4769 130 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -11775 -12817 -894 -1115 -2793 -225 * * 101 -285 -5285 -373 -1222 -2100 984 -938 466 -326 -2049 -126 553 1956 -216 579 -766 237 -94 -5471 -4791 131 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 T -1 -11802 -12844 -894 -1115 -1167 -850 * * 102 -1000 -5382 486 -317 -974 440 -837 -1126 -889 -1249 -338 776 626 63 416 1701 376 -1775 -5566 -4883 132 - -152 -491 240 42 -370 398 106 -628 211 -467 -726 276 395 40 90 362 117 -372 -300 -249 C -588 -2825 -2369 -1403 -685 -1006 -994 * * 103 -1090 -446 631 -48 -764 489 -3316 -5229 1337 -1894 -4246 1085 -452 1012 1186 541 -695 -1332 -331 -4658 145 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 T -1 -11656 -12698 -894 -1115 -3144 -173 * * 104 -1464 1270 381 -380 -898 1320 736 -5260 388 -2720 -1083 1547 -2109 -528 -183 938 206 -4810 -60 432 146 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 T -1 -11689 -12732 -894 -1115 -1125 -885 * * 105 -2302 -5332 2018 -378 -516 -1754 274 -947 372 -1673 -222 858 -2297 -385 613 893 848 -445 -5516 -4833 147 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -11843 -12885 -894 -1115 -2983 -195 * * 106 1406 260 -6496 -5860 2554 -494 -1206 -1204 -5458 293 -3193 -692 -703 -1905 -2420 -968 -2290 807 834 1190 148 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -11856 -12898 -894 -1115 -2751 -232 * * 107 -1271 -5180 -689 -1307 -1301 -4917 1693 -577 -3202 -2131 -4290 1305 -5008 10 -1050 2176 1343 -348 -5412 603 149 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -11871 -12913 -894 -1115 -68 -4437 * * 108 -7003 -1173 -9477 -8948 -507 -9199 -8125 1223 -8760 2866 1133 -8892 -8498 -7797 -8398 -8520 -2706 399 -6880 -6988 150 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -12012 -13054 -894 -1115 -701 -1378 * * 109 -1828 -5439 -3899 -1166 -1353 -5011 476 -1098 1024 -416 -1259 894 -2447 825 1191 152 2279 -1269 -5638 -4971 151 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -12012 -13054 -894 -1115 -701 -1378 * * 110 -2619 -4239 -6774 -6143 -634 -5986 -4865 3287 -840 1144 1624 -5632 -6030 -5366 -5545 -1997 -2325 -377 -4723 -4382 152 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -12012 -13054 -894 -1115 -701 -1378 * * 111 -1051 -5493 -2033 -140 -598 -2545 272 -5563 124 -945 540 242 -5088 1579 738 1692 1441 -1780 -705 -4994 153 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 S -1 -12012 -13054 -894 -1115 -701 -1378 * * 112 -267 -5494 2031 -1035 -5815 -462 547 -5566 468 -3141 -2177 2515 -893 214 -168 636 -3960 -5116 -5677 -1561 154 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 395 45 96 359 117 -369 -294 -249 S -37 -7217 -5775 -199 -2956 -701 -1378 * * 113 317 -4118 -6638 -6002 -1683 -5840 -4711 967 -5597 2173 1109 -5486 -2155 -2172 -5397 -795 -515 1593 -4576 -4234 156 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 C -1 -11986 -13028 -894 -1115 -249 -2655 * * 114 -4021 -5494 -926 330 -5815 -4995 249 -1257 535 -2345 -684 -302 -5088 2530 1760 566 1315 -1824 -5677 -4994 157 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 S -1 -12012 -13054 -894 -1115 -701 -1378 * * 115 314 -4196 -6204 -4 -339 -5788 -4635 -1041 -1238 1015 1036 -5283 1680 -220 203 309 -751 665 -586 -169 158 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 G -1 -12012 -13054 -894 -1115 -701 -1378 * * 116 604 -5494 470 1869 -2159 -443 -3653 -1808 -1027 -3105 -4583 0 -5088 -854 -849 2100 -408 -2715 -5677 -4994 159 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 G -1 -12012 -13054 -894 -1115 -701 -1378 * * 117 -6546 -8674 4070 -2417 -2378 -6231 950 -8771 -6133 -8581 -8057 -1147 -6834 -2029 -2310 -6130 -6654 -8200 -8773 -7605 160 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 G -1 -12012 -13054 -894 -1115 -701 -1378 * * 118 -118 -1070 -1438 1223 -2108 -2721 -3882 -1061 -3602 -3086 -692 786 -5274 -958 -4048 2379 1601 -2467 -5287 -1600 161 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 C -1 -12012 -13054 -894 -1115 -701 -1378 * * 119 1948 -6066 -1655 -6049 -8131 3247 -6409 -7846 -2789 -3162 -7165 -5885 -6760 -6184 -2834 -5524 -5740 -2902 -8114 -7793 162 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -12012 -13054 -894 -1115 -701 -1378 * * 120 -2897 239 -1825 -784 -710 -2674 -4719 1448 -5570 -33 196 -708 -5902 -5205 -974 -230 2478 1198 -4603 -842 163 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -12012 -13054 -894 -1115 -701 -1378 * * 121 -8646 -740 -9029 -9389 -1850 -8901 -5094 -7518 -8946 -3310 -6915 -7520 -8759 -7661 -8304 -8155 -8500 -7688 -4340 4915 164 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -12012 -13054 -894 -1115 -701 -1378 * * 122 -4211 -655 -5107 -667 2211 -5518 678 -642 -427 -334 -3643 -976 -5588 -240 -789 -196 1246 -2234 1472 2814 165 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -12012 -13054 -894 -1115 -701 -1378 * * 123 -6926 5693 -9498 -9076 -1899 -9335 -8534 -2099 -8983 -1057 -4150 -9036 -8662 -8133 -8733 -8745 -6858 -428 -7193 -7156 166 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -12012 -13054 -894 -1115 -701 -1378 * * 124 1969 -5493 -93 -79 -5813 598 247 -2605 -335 -2340 -184 -958 -5088 1554 -119 -363 -701 212 -5676 -1702 167 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -13 -12012 -6870 -894 -1115 -701 -1378 * * 125 6 -4287 -5637 -1243 -152 -2879 -4468 1517 -291 -322 -41 -2109 -2305 -1095 -183 665 -372 2121 -446 -4346 168 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -16 -12000 -6551 -894 -1115 -1434 -667 * * 126 -315 -5368 -831 -804 227 -206 -236 407 821 -853 638 -1051 -2400 -454 -1191 1102 1 -67 1957 1147 169 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -11985 -13027 -894 -1115 -2013 -411 * * 127 -1361 -766 1114 -650 16 200 1032 -1633 -711 -753 -1299 1565 -2429 -455 -702 520 961 -1628 -432 1128 170 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -136 -11985 -3476 -894 -1115 -2013 -411 * * 128 -669 1120 -3745 52 647 667 312 -5311 -443 -105 1031 -134 1094 -654 -429 971 66 -2633 -5482 417 171 - -162 -495 245 51 -376 417 84 -634 203 -472 -741 277 399 26 95 352 119 -381 -311 -192 T -1324 -921 -3787 -1910 -446 -3932 -98 * * 129 -733 -5203 1467 103 -366 233 -3408 -879 -1060 -1480 -1128 1838 670 -311 -3499 121 -487 -2553 -5396 1820 185 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 S -1 -11741 -12783 -894 -1115 -573 -1609 * * 130 -199 -5390 -114 1396 755 87 1049 -211 183 -999 410 -935 -59 -852 -782 -891 511 -861 -502 544 186 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -11930 -12972 -894 -1115 -618 -1520 * * 131 -2430 -5109 1019 -1391 -53 609 253 716 -1368 605 635 -264 -688 359 -953 -766 -2166 690 834 864 187 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -11985 -13027 -894 -1115 -790 -1246 * * 132 -4033 -677 -872 52 -2161 0 -3709 507 -571 -711 -360 309 -515 568 -253 164 681 1177 -5532 1639 188 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -12000 -13042 -894 -1115 -380 -2111 * * 133 -1373 -783 -856 -482 3276 -588 -159 -1698 -1048 -995 303 -1140 -2372 -917 -181 -186 -1423 -2751 2445 458 189 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -12012 -13054 -894 -1115 -701 -1378 * * 134 -1372 -5485 -472 47 -460 2636 994 -311 -1242 -1029 -4575 -1196 -1457 650 -2021 -367 -389 -1421 -5671 -1546 190 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -12012 -13054 -894 -1115 -701 -1378 * * 135 15 -793 -308 745 -1934 731 859 -5566 714 -883 -4583 567 99 1152 -474 623 631 -5116 -5677 -4995 191 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 C -1 -12012 -13054 -894 -1115 -701 -1378 * * 136 -1719 -5491 -441 431 -1209 2603 -409 -736 -101 -3082 -1372 -174 -2089 -102 -432 -64 -295 -806 -5675 -1546 192 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -12012 -13054 -894 -1115 -701 -1378 * * 137 -233 -4146 -6600 -5967 -1172 -1835 -370 761 -2744 -1894 84 -5480 -2443 -1998 -1021 -969 3004 997 -4602 591 193 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -12012 -13054 -894 -1115 -701 -1378 * * 138 -2609 -5454 -1984 286 -518 -2629 908 -500 982 -186 137 273 -5099 1120 1836 -644 560 -49 402 -1559 194 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -12012 -13054 -894 -1115 -701 -1378 * * 139 -6354 -5920 -8842 -1940 -565 -8399 -7461 -790 -8093 2779 -4075 -8067 -8060 -7485 -7881 -908 -6290 1865 -6708 -6584 195 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -12012 -13054 -894 -1115 -701 -1378 * * 140 -1777 -5322 -1512 -139 -1102 -5049 781 684 -508 -378 180 -616 -2117 794 92 -145 2177 359 -528 -282 196 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -12012 -13054 -894 -1115 -701 -1378 * * 141 -1737 -4140 -6657 -2794 -1893 -1873 -205 528 -5617 -1168 -3343 -5506 -366 -5240 -5417 -2602 -4254 3468 -4598 -1573 197 - -149 -500 233 43 -381 399 106 -626 210 -466 -720 275 394 45 96 359 117 -369 -294 -249 E -1 -12012 -13054 -894 -1115 -701 -1378 * * 142 -4063 -5180 -4075 311 -1132 -1881 493 -1081 679 1218 76 -93 -2300 -1043 469 312 933 844 -5445 -675 198 - * * * * * * * * * * * * * * * * * * * * E * * * * * * * * 0 // input sequence file (quick.txt): >2d9c_A mol:protein length:136 Signal-regulatory protein beta-1 GSSGSSGELQVIQPEKSVSVAAGESATLRCAMTSLIPVGPIMWFRGAGAGRELIYNQKEGHFPRVTTVSELTKRNNLDFSISISNITPADAGTYYCVKFRKGSPDDVEFKSGAGTELSVRAKPSAPVVSGSGPSSG test script (test.pl): #!/usr/bin/perl use strict; use warnings; use Bio::SearchIO; use Bio::Tools::Run::Hmmer; my $factory = Bio::Tools::Run::Hmmer->new( 'program'=>'hmmsearch', 'hmm'=>'./pfam_source/V-set_ls.hmm', 'informat' => 'fasta' ); $factory->verbose(1); my $search = $factory->run("./quick.txt"); while (my $result = $search->next_result){ while(my $hit = $result->next_hit){ while (my $hsp = $hit->next_hsp){ } } } Output from script: --------------------- WARNING --------------------- MSG: Unrecognized line: ++ i W+r +g g eli+ ++ ++++++ t+ +++ +r STACK Bio::SearchIO::hmmer::next_result /opt/perl_modules/lib/perl5//Bio/SearchIO/hmmer.pm:562 STACK toplevel ./test.pl:14 --------------------------------------------------- --------------------- WARNING --------------------- MSG: Unrecognized line: sgnpskgdfsLtIsnlqlsDsGtYyCavsns.....nelvfgggtrLtVl STACK Bio::SearchIO::hmmer::next_result /opt/perl_modules/lib/perl5//Bio/SearchIO/hmmer.pm:562 STACK toplevel ./test.pl:14 --------------------------------------------------- --------------------- WARNING --------------------- MSG: Unrecognized line: CS STACK Bio::SearchIO::hmmer::next_result /opt/perl_modules/lib/perl5//Bio/SearchIO/hmmer.pm:562 STACK toplevel ./test.pl:14 --------------------------------------------------- Altering line 8 in the file ./pfam_source/V-set_ls.hmm, from: CS yes to CS no stops the error, but as the keyword CS is a valid choice for running/building hmmer files, I feel that there is a bug in how Bio::SearchIO::hmmer parses the hmmer output. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From cjfields at dev.open-bio.org Mon May 18 21:29:25 2009 From: cjfields at dev.open-bio.org (Christopher John Fields) Date: Mon, 18 May 2009 21:29:25 -0400 Subject: [Bioperl-guts-l] [15685] bioperl-live/trunk/Bio/Root/RootI.pm: warn, not throw ( should be consisten) Message-ID: <200905190129.n4J1TPbd012237@dev.open-bio.org> Revision: 15685 Author: cjfields Date: 2009-05-18 21:29:24 -0400 (Mon, 18 May 2009) Log Message: ----------- warn, not throw (should be consisten) Modified Paths: -------------- bioperl-live/trunk/Bio/Root/RootI.pm Modified: bioperl-live/trunk/Bio/Root/RootI.pm =================================================================== --- bioperl-live/trunk/Bio/Root/RootI.pm 2009-05-18 01:59:48 UTC (rev 15684) +++ bioperl-live/trunk/Bio/Root/RootI.pm 2009-05-19 01:29:24 UTC (rev 15685) @@ -230,7 +230,7 @@ $self->throw('Version must be numerical, such as 1.006000 for v1.6.0, not '. $version) unless $version =~ /^\d+\.\d+$/; if ($Bio::Root::Version::VERSION >= $version) { - $self->throw($msg) + $self->warn($msg) } } # passing this on to warn() should deal properly with verbosity issues From cjfields at dev.open-bio.org Mon May 18 21:59:15 2009 From: cjfields at dev.open-bio.org (Christopher John Fields) Date: Mon, 18 May 2009 21:59:15 -0400 Subject: [Bioperl-guts-l] [15686] bioperl-live/trunk/Bio/Root/RootI.pm: * switch back to throw() ( forgot this is supposed to throw) Message-ID: <200905190159.n4J1xFs1012289@dev.open-bio.org> Revision: 15686 Author: cjfields Date: 2009-05-18 21:59:14 -0400 (Mon, 18 May 2009) Log Message: ----------- * switch back to throw() (forgot this is supposed to throw) * add more message info Modified Paths: -------------- bioperl-live/trunk/Bio/Root/RootI.pm Modified: bioperl-live/trunk/Bio/Root/RootI.pm =================================================================== --- bioperl-live/trunk/Bio/Root/RootI.pm 2009-05-19 01:29:24 UTC (rev 15685) +++ bioperl-live/trunk/Bio/Root/RootI.pm 2009-05-19 01:59:14 UTC (rev 15686) @@ -225,12 +225,17 @@ sub deprecated{ my ($self) = shift; my ($msg, $version) = $self->_rearrange([qw(MESSAGE VERSION)], @_); + if (!defined $msg) { + my $prev = (caller(0))[3]; + $msg = "Use of ".$prev."() is deprecated"; + } # delegate to either warn or throw based on whether a version is given if ($version) { $self->throw('Version must be numerical, such as 1.006000 for v1.6.0, not '. $version) unless $version =~ /^\d+\.\d+$/; + $msg .= "\nDeprecated in $version"; if ($Bio::Root::Version::VERSION >= $version) { - $self->warn($msg) + $self->throw($msg) } } # passing this on to warn() should deal properly with verbosity issues From maj at dev.open-bio.org Tue May 19 12:27:43 2009 From: maj at dev.open-bio.org (Mark Allen Jensen) Date: Tue, 19 May 2009 12:27:43 -0400 Subject: [Bioperl-guts-l] [15687] bioperl-live/trunk/t/data/contig-by-hand.wublastp: this is a contig in blast format done by hand, Message-ID: <200905191627.n4JGRhTY014702@dev.open-bio.org> Revision: 15687 Author: maj Date: 2009-05-19 12:27:41 -0400 (Tue, 19 May 2009) Log Message: ----------- this is a contig in blast format done by hand, for non-kludgy testing of B:S:SU::tile_hsps() Added Paths: ----------- bioperl-live/trunk/t/data/contig-by-hand.wublastp Added: bioperl-live/trunk/t/data/contig-by-hand.wublastp =================================================================== --- bioperl-live/trunk/t/data/contig-by-hand.wublastp (rev 0) +++ bioperl-live/trunk/t/data/contig-by-hand.wublastp 2009-05-19 16:27:41 UTC (rev 15687) @@ -0,0 +1,122 @@ +BLASTP 2.0MP-WashU [12-Feb-2001] [linux-i686 01:36:08 31-Jan-2001] + +Copyright (C) 1996-2000 Washington University, Saint Louis, Missouri USA. +All Rights Reserved. + +Reference: Gish, W. (1996-2000) http://blast.wustl.edu + +Query= gi|1786183|gb|AAC73113.1| (AE000111) aspartokinase I, homoserine + dehydrogenase I [Escherichia coli] + (820 letters) + +Database: ecoli.aa + 4289 sequences; 1,358,990 total letters. +Searching....10....20....30....40....50....60....70....80....90....100% done + + Smallest + Sum + High Probability +Sequences producing High-scoring Segment Pairs: Score P(N) N + +gb|AAC76922.1| (AE000468) aspartokinase II (contig by hand) 999 0.0e-00 1 + + + +>gb|AAC76922.1| (AE000468) aspartokinase II (contig by hand) + + Length = 810 + + Score = 999 (999.9 bits), Expect = 0.0e-00, P = 0.0e-00 + Identities = 250/810 (31%), Positives = 413/810 (51%) + + +Query: 5 KFGGTSVANAERFLRVADILESNARQGQVATVLSAPAKITNHLVAMIEKTISGQDALPNI 64 + KFGG+S+A+ + +LRVA I+ ++ + V+SA TN L+ ++ + + + + + +Sbjct: 16 KFGGSSLADVKCYLRVAGIMAEYSQPDDMM-VVSAAGSTTNQLINWLKLSQTDRLSAHQV 74 + +Query: 65 SDAERIF-AELLTGLAAAQPGFPLAQLKTFVDQEFAQIKHVLH-GISLLGQCPDSINAAL 122 + R + +L++GL A+ L + FV + ++ +L GI+ D++ A + +Sbjct: 75 QQTLRRYQCDLISGLLPAEEADSL--ISAFVS-DLERLAALLDSGIN------DAVYAEV 125 + +Query: 123 ICRGEKMSIAIMAGVLEARGHNVTVIDPVEKLLAVGHYLESTVDIAESTRRIAASRI--P 180 + + GE S +M+ VL +G +D E L A + VD S + + P +Sbjct: 126 VGHGEVWSARLMSAVLNQQGLPAAWLDAREFLRAE-RAAQPQVDEGLSYPLLQQLLVQHP 184 + +Query: 181 ADHMVLMAGFTAGNEKGELVVLGRNGSDYSAAVLAACLRADCCEIWTDVDGVYTCDPRQV 240 + +V+ GF + N GE V+LGRNGSDYSA + A IW+DV GVY+ DPR+V +Sbjct: 185 GKRLVV-TGFISRNNAGETVLLGRNGSDYSATQIGALAGVSRVTIWSDVAGVYSADPRKV 243 + +Query: 241 PDARLLKSMSYQEAMELSYFGAKVLHPRTITPIAQFQIPCLIKNTGNPQAPGTLIGASRD 300 + DA LL + EA EL+ A VLH RT+ P++ +I ++ + P T I +Sbjct: 244 KDACLLPLLRLDEASELARLAAPVLHARTLQPVSGSEIDLQLRCSYTPDQGSTRIERVLA 303 + +Query: 301 EDELPVKGISNLNNMAMFSVSGPGMKGMVGMAARVFAAMSRARISVVLITQSSSEYSISF 360 + + +++ +++ + P + + + RA++ + + + + F +Sbjct: 304 SGT-GARIVTSHDDVCLIEFQVPASQDFKLAHKEIDQILKRAQVRPLAVGVHNDRQLLQF 362 + +Query: 361 CVPQSDCVRAERAMQEEFYLELKEGLLEPLAVTERLAIISVVGDGMRTLRGIS 413 + C A + + E GL L + + LA++++VG G+ T + +Sbjct: 363 CYTSEVADSALKILDEA-------GLPGELRLRQGLALVAMVGAGV-TRNPLH 407 + +Query: 414 A-KFFAALARANINIVAIAQGSSERSISVVVNNDDATTGVRVTHQMLFNTDQVIEVFVIG 472 + +F+ L + Q S+ V+ + ++ HQ +F ++ I + + G +Sbjct: 408 CHRFWQQLKGQPVEFTW--QSDDGISLVAVLRTGPTESLIQGLHQSVFRAEKRIGLVLFG 465 + +Query: 473 VGGVGGALLEQLKRQQSWLKNKH-IDLRVCGVANSKALLTNVHGLN----LENWQEELAQ 527 + G +G LE R+QS L + + + GV +S+ L + GL+ L + +E + +Sbjct: 466 KGNIGSRWLELFAREQSTLSARTGFEFVLAGVVDSRRSLLSYDGLDASRALAFFNDEAVE 525 + +Query: 528 AKEPFNLGRLIRLVKEYHLLNPVIVDCTSSQAVADQYADFLREGFHVVTPNKKANTSSMD 587 + E L ++ + + V++D T+SQ +ADQY DF GFHV++ NK A S + +Sbjct: 526 QDEE----SLFLWMRAHPYDDLVVLDVTASQQLADQYLDFASHGFHVISANKLAGASDSN 581 + +Query: 588 YYHQLRYAAEKSRRKFLYDTNVGAGLPVIENLQNLLNAGDELMKFSGILSGSLSYIFGKL 647 + Y Q+ A EK+ R +LY+ VGAGLP+ +++L+++GD ++ SGI SG+LS++F + +Sbjct: 582 KYRQIHDAFEKTGRHWLYNATVGAGLPINHTVRDLIDSGDTILSISGIFSGTLSWLFLQF 641 + +Query: 648 DEGMSFSEATTLAREMGYTEPDPRDDLSGMDVARKLLILARETGRELELADIEIEPVLPA 707 + D + F+E A + G TEPDPRDDLSG DV RKL+ILARE G +E + +E ++PA +Sbjct: 642 DGSVPFTELVDQAWQQGLTEPDPRDDLSGKDVMRKLVILAREAGYNIEPDQVRVESLVPA 701 + +Query: 708 EFNAEGDVAAFMANLSQLDDLFAARVAKARDEGKVLRYVGNIDEDGVCRVKIAEVDGNDP 767 + G + F N +L++ R+ AR+ G VLRYV D +G RV + V + P +Sbjct: 702 HCEG-GSIDHFFENGDELNEQMVQRLEAAREMGLVLRYVARFDANGKARVGVEAVREDHP 760 + +Query: 768 LFKVKNGENALAFYSHYYQPLPLVLRGYGAGNDVTAAGVFADLLR 812 + L + +N A S +Y+ PLV+RG GAG DVTA + +D+ R +Sbjct: 761 LASLLPCDNVFAIESRWYRDNPLVIRGPGAGRDVTAGAIQSDINR 805 + + +Parameters: + E=0.01 + + ctxfactor=1.00 + + Query ----- As Used ----- ----- Computed ---- + Frame MatID Matrix name Lambda K H Lambda K H + +0 0 BLOSUM62 0.319 0.136 0.384 same same same + Q=9,R=2 0.244 0.0300 0.180 n/a n/a n/a + + Query + Frame MatID Length Eff.Length E S W T X E2 S2 + +0 0 820 820 0.010 93 3 11 22 0.19 34 + 37 0.22 37 + + +Statistics: + + Database: /home/jes12/db/ecoli.aa + Title: ecoli.aa + Posted: 2:52:35 PM EST Nov 18, 2001 + Created: 9:46:47 AM EST Nov 18, 2001 + Format: XDF-1 + # of letters in database: 1,358,990 + # of sequences in database: 4289 + # of database sequences satisfying E: 4 + No. of states in DFA: 573 (61 KB) + Total size of DFA: 281 KB (1149 KB) + Time to generate neighborhood: 0.00u 0.02s 0.02t Elapsed: 00:00:00 + No. of threads or processors used: 1 + Search cpu time: 1.58u 0.00s 1.58t Elapsed: 00:00:01 + Total cpu time: 1.59u 0.02s 1.61t Elapsed: 00:00:01 + Start: Thu Dec 6 11:09:14 2001 End: Thu Dec 6 11:09:15 2001 Property changes on: bioperl-live/trunk/t/data/contig-by-hand.wublastp ___________________________________________________________________ Name: svn:executable + * From maj at dev.open-bio.org Tue May 19 12:29:27 2009 From: maj at dev.open-bio.org (Mark Allen Jensen) Date: Tue, 19 May 2009 12:29:27 -0400 Subject: [Bioperl-guts-l] [15688] bioperl-live/trunk/t/SearchIO/blast.t: mods to test B:S:SU:: tile_hsps against a Message-ID: <200905191629.n4JGTRJM014733@dev.open-bio.org> Revision: 15688 Author: maj Date: 2009-05-19 12:29:27 -0400 (Tue, 19 May 2009) Log Message: ----------- mods to test B:S:SU::tile_hsps against a human-contigged pair of original HSPs Modified Paths: -------------- bioperl-live/trunk/t/SearchIO/blast.t Modified: bioperl-live/trunk/t/SearchIO/blast.t =================================================================== --- bioperl-live/trunk/t/SearchIO/blast.t 2009-05-19 16:27:41 UTC (rev 15687) +++ bioperl-live/trunk/t/SearchIO/blast.t 2009-05-19 16:29:27 UTC (rev 15688) @@ -4,6 +4,7 @@ use strict; BEGIN { + chdir("c:/cygwin/usr/local/lib/perl5/bioperl-trunk"); use lib '.'; use Bio::Root::Test; @@ -132,17 +133,33 @@ if ($count==1) { # Test HSP contig data returned by SearchUtils::tile_hsps() # Second hit has two hsps that overlap. + + # compare with the contig made by hand for these two contigs + # in t/data/contig-by-hand.wublastp + # (in this made-up file, the hsps from ecolitst.wublastp + # were aligned and contiged, and Length, Identities, Positives + # were counted, by a human (maj) ) + + my $hand_hit = Bio::SearchIO->new( + -format=>'blast', + -file=>test_input_file('contig-by-hand.wublastp') + )->next_result->next_hit; + my $hand_hsp = $hand_hit->next_hsp; + my @hand_qrng = $hand_hsp->range('query'); + my @hand_srng = $hand_hsp->range('hit'); + my @hand_matches = $hand_hit->matches; + my($qcontigs, $scontigs) = Bio::Search::SearchUtils::tile_hsps($hit); # Query contigs - is($qcontigs->[0]->{'start'}, 5); - is($qcontigs->[0]->{'stop'}, 812); - is($qcontigs->[0]->{'iden'}, 250); - is($qcontigs->[0]->{'cons'}, 413); + is($qcontigs->[0]->{'start'}, $hand_qrng[0]); + is($qcontigs->[0]->{'stop'}, $hand_qrng[1]); + is($qcontigs->[0]->{'iden'}, $hand_matches[0]); + is($qcontigs->[0]->{'cons'}, $hand_matches[1]); # Subject contigs - is($scontigs->[0]->{'start'}, 16); - is($scontigs->[0]->{'stop'}, 805); - is($scontigs->[0]->{'iden'}, 248); - is($scontigs->[0]->{'cons'}, 410); + is($scontigs->[0]->{'start'}, $hand_srng[0]); + is($scontigs->[0]->{'stop'}, $hand_srng[1]); + is($scontigs->[0]->{'iden'}, $hand_matches[0]); + is($scontigs->[0]->{'cons'}, $hand_matches[1]); } is($hit->name, shift @$d); From maj at dev.open-bio.org Tue May 19 12:30:41 2009 From: maj at dev.open-bio.org (Mark Allen Jensen) Date: Tue, 19 May 2009 12:30:41 -0400 Subject: [Bioperl-guts-l] [15689] bioperl-live/trunk/Bio/Search/HSP/HSPI.pm: this patch allows B:S: HSP::HSPI::matches to Message-ID: <200905191630.n4JGUfbH014764@dev.open-bio.org> Revision: 15689 Author: maj Date: 2009-05-19 12:30:41 -0400 (Tue, 19 May 2009) Log Message: ----------- this patch allows B:S:HSP::HSPI::matches to handle gapped HSPs correctly Modified Paths: -------------- bioperl-live/trunk/Bio/Search/HSP/HSPI.pm Modified: bioperl-live/trunk/Bio/Search/HSP/HSPI.pm =================================================================== --- bioperl-live/trunk/Bio/Search/HSP/HSPI.pm 2009-05-19 16:29:27 UTC (rev 15688) +++ bioperl-live/trunk/Bio/Search/HSP/HSPI.pm 2009-05-19 16:30:41 UTC (rev 15689) @@ -672,6 +672,7 @@ ## Get data for the whole alignment. push @data, ($self->num_identical, $self->num_conserved); } else { + ## Get the substring representing the desired sub-section of aln. $beg ||= 0; $end ||= 0; @@ -681,20 +682,34 @@ if($end > $stop) { $end = $stop; } if($beg < $start) { $beg = $start; } + + # now with gap handling! /maj + my $match_str = $self->seq_str('match'); + if ($self->gaps) { + # strip the homology string of gap positions relative + # to the target type + $match_str = $self->seq_str('match'); + my $tgt = $self->seq_str($seqType); + my $encode = $match_str ^ $tgt; + my $zap = '-'^' '; + $encode =~ s/$zap//g; + $tgt =~ s/-//g; + $match_str = $tgt ^ $encode; + } ## ML: START fix for substr out of range error ------------------ my $seq = ""; if (($self->algorithm =~ /TBLAST[NX]/) && ($seqType eq 'sbjct')) { - $seq = substr($self->seq_str('match'), + $seq = substr($match_str, int(($beg-$start)/3), int(($end-$beg+1)/3)); } elsif (($self->algorithm =~ /T?BLASTX/) && ($seqType eq 'query')) { - $seq = substr($self->seq_str('match'), + $seq = substr($match_str, int(($beg-$start)/3), int(($end-$beg+1)/3)); } else { - $seq = substr($self->seq_str('match'), + $seq = substr($match_str, $beg-$start, ($end-$beg+1)); } ## ML: End of fix for substr out of range error ----------------- From maj at dev.open-bio.org Tue May 19 12:36:17 2009 From: maj at dev.open-bio.org (Mark Allen Jensen) Date: Tue, 19 May 2009 12:36:17 -0400 Subject: [Bioperl-guts-l] [15690] bioperl-live/trunk/t/data/contig-by-hand.wublastp: unixize ( sorry about that...) Message-ID: <200905191636.n4JGaHRC014815@dev.open-bio.org> Revision: 15690 Author: maj Date: 2009-05-19 12:36:17 -0400 (Tue, 19 May 2009) Log Message: ----------- unixize (sorry about that...) Modified Paths: -------------- bioperl-live/trunk/t/data/contig-by-hand.wublastp Modified: bioperl-live/trunk/t/data/contig-by-hand.wublastp =================================================================== --- bioperl-live/trunk/t/data/contig-by-hand.wublastp 2009-05-19 16:30:41 UTC (rev 15689) +++ bioperl-live/trunk/t/data/contig-by-hand.wublastp 2009-05-19 16:36:17 UTC (rev 15690) @@ -1,122 +1,122 @@ -BLASTP 2.0MP-WashU [12-Feb-2001] [linux-i686 01:36:08 31-Jan-2001] - -Copyright (C) 1996-2000 Washington University, Saint Louis, Missouri USA. -All Rights Reserved. - -Reference: Gish, W. (1996-2000) http://blast.wustl.edu - -Query= gi|1786183|gb|AAC73113.1| (AE000111) aspartokinase I, homoserine - dehydrogenase I [Escherichia coli] - (820 letters) - -Database: ecoli.aa - 4289 sequences; 1,358,990 total letters. -Searching....10....20....30....40....50....60....70....80....90....100% done - - Smallest - Sum - High Probability -Sequences producing High-scoring Segment Pairs: Score P(N) N - -gb|AAC76922.1| (AE000468) aspartokinase II (contig by hand) 999 0.0e-00 1 - - - ->gb|AAC76922.1| (AE000468) aspartokinase II (contig by hand) - - Length = 810 - - Score = 999 (999.9 bits), Expect = 0.0e-00, P = 0.0e-00 - Identities = 250/810 (31%), Positives = 413/810 (51%) - - -Query: 5 KFGGTSVANAERFLRVADILESNARQGQVATVLSAPAKITNHLVAMIEKTISGQDALPNI 64 - KFGG+S+A+ + +LRVA I+ ++ + V+SA TN L+ ++ + + + + + -Sbjct: 16 KFGGSSLADVKCYLRVAGIMAEYSQPDDMM-VVSAAGSTTNQLINWLKLSQTDRLSAHQV 74 - -Query: 65 SDAERIF-AELLTGLAAAQPGFPLAQLKTFVDQEFAQIKHVLH-GISLLGQCPDSINAAL 122 - R + +L++GL A+ L + FV + ++ +L GI+ D++ A + -Sbjct: 75 QQTLRRYQCDLISGLLPAEEADSL--ISAFVS-DLERLAALLDSGIN------DAVYAEV 125 - -Query: 123 ICRGEKMSIAIMAGVLEARGHNVTVIDPVEKLLAVGHYLESTVDIAESTRRIAASRI--P 180 - + GE S +M+ VL +G +D E L A + VD S + + P -Sbjct: 126 VGHGEVWSARLMSAVLNQQGLPAAWLDAREFLRAE-RAAQPQVDEGLSYPLLQQLLVQHP 184 - -Query: 181 ADHMVLMAGFTAGNEKGELVVLGRNGSDYSAAVLAACLRADCCEIWTDVDGVYTCDPRQV 240 - +V+ GF + N GE V+LGRNGSDYSA + A IW+DV GVY+ DPR+V -Sbjct: 185 GKRLVV-TGFISRNNAGETVLLGRNGSDYSATQIGALAGVSRVTIWSDVAGVYSADPRKV 243 - -Query: 241 PDARLLKSMSYQEAMELSYFGAKVLHPRTITPIAQFQIPCLIKNTGNPQAPGTLIGASRD 300 - DA LL + EA EL+ A VLH RT+ P++ +I ++ + P T I -Sbjct: 244 KDACLLPLLRLDEASELARLAAPVLHARTLQPVSGSEIDLQLRCSYTPDQGSTRIERVLA 303 - -Query: 301 EDELPVKGISNLNNMAMFSVSGPGMKGMVGMAARVFAAMSRARISVVLITQSSSEYSISF 360 - + +++ +++ + P + + + RA++ + + + + F -Sbjct: 304 SGT-GARIVTSHDDVCLIEFQVPASQDFKLAHKEIDQILKRAQVRPLAVGVHNDRQLLQF 362 - -Query: 361 CVPQSDCVRAERAMQEEFYLELKEGLLEPLAVTERLAIISVVGDGMRTLRGIS 413 - C A + + E GL L + + LA++++VG G+ T + -Sbjct: 363 CYTSEVADSALKILDEA-------GLPGELRLRQGLALVAMVGAGV-TRNPLH 407 - -Query: 414 A-KFFAALARANINIVAIAQGSSERSISVVVNNDDATTGVRVTHQMLFNTDQVIEVFVIG 472 - +F+ L + Q S+ V+ + ++ HQ +F ++ I + + G -Sbjct: 408 CHRFWQQLKGQPVEFTW--QSDDGISLVAVLRTGPTESLIQGLHQSVFRAEKRIGLVLFG 465 - -Query: 473 VGGVGGALLEQLKRQQSWLKNKH-IDLRVCGVANSKALLTNVHGLN----LENWQEELAQ 527 - G +G LE R+QS L + + + GV +S+ L + GL+ L + +E + -Sbjct: 466 KGNIGSRWLELFAREQSTLSARTGFEFVLAGVVDSRRSLLSYDGLDASRALAFFNDEAVE 525 - -Query: 528 AKEPFNLGRLIRLVKEYHLLNPVIVDCTSSQAVADQYADFLREGFHVVTPNKKANTSSMD 587 - E L ++ + + V++D T+SQ +ADQY DF GFHV++ NK A S + -Sbjct: 526 QDEE----SLFLWMRAHPYDDLVVLDVTASQQLADQYLDFASHGFHVISANKLAGASDSN 581 - -Query: 588 YYHQLRYAAEKSRRKFLYDTNVGAGLPVIENLQNLLNAGDELMKFSGILSGSLSYIFGKL 647 - Y Q+ A EK+ R +LY+ VGAGLP+ +++L+++GD ++ SGI SG+LS++F + -Sbjct: 582 KYRQIHDAFEKTGRHWLYNATVGAGLPINHTVRDLIDSGDTILSISGIFSGTLSWLFLQF 641 - -Query: 648 DEGMSFSEATTLAREMGYTEPDPRDDLSGMDVARKLLILARETGRELELADIEIEPVLPA 707 - D + F+E A + G TEPDPRDDLSG DV RKL+ILARE G +E + +E ++PA -Sbjct: 642 DGSVPFTELVDQAWQQGLTEPDPRDDLSGKDVMRKLVILAREAGYNIEPDQVRVESLVPA 701 - -Query: 708 EFNAEGDVAAFMANLSQLDDLFAARVAKARDEGKVLRYVGNIDEDGVCRVKIAEVDGNDP 767 - G + F N +L++ R+ AR+ G VLRYV D +G RV + V + P -Sbjct: 702 HCEG-GSIDHFFENGDELNEQMVQRLEAAREMGLVLRYVARFDANGKARVGVEAVREDHP 760 - -Query: 768 LFKVKNGENALAFYSHYYQPLPLVLRGYGAGNDVTAAGVFADLLR 812 - L + +N A S +Y+ PLV+RG GAG DVTA + +D+ R -Sbjct: 761 LASLLPCDNVFAIESRWYRDNPLVIRGPGAGRDVTAGAIQSDINR 805 - - -Parameters: - E=0.01 - - ctxfactor=1.00 - - Query ----- As Used ----- ----- Computed ---- - Frame MatID Matrix name Lambda K H Lambda K H - +0 0 BLOSUM62 0.319 0.136 0.384 same same same - Q=9,R=2 0.244 0.0300 0.180 n/a n/a n/a - - Query - Frame MatID Length Eff.Length E S W T X E2 S2 - +0 0 820 820 0.010 93 3 11 22 0.19 34 - 37 0.22 37 - - -Statistics: - - Database: /home/jes12/db/ecoli.aa - Title: ecoli.aa - Posted: 2:52:35 PM EST Nov 18, 2001 - Created: 9:46:47 AM EST Nov 18, 2001 - Format: XDF-1 - # of letters in database: 1,358,990 - # of sequences in database: 4289 - # of database sequences satisfying E: 4 - No. of states in DFA: 573 (61 KB) - Total size of DFA: 281 KB (1149 KB) - Time to generate neighborhood: 0.00u 0.02s 0.02t Elapsed: 00:00:00 - No. of threads or processors used: 1 - Search cpu time: 1.58u 0.00s 1.58t Elapsed: 00:00:01 - Total cpu time: 1.59u 0.02s 1.61t Elapsed: 00:00:01 - Start: Thu Dec 6 11:09:14 2001 End: Thu Dec 6 11:09:15 2001 +BLASTP 2.0MP-WashU [12-Feb-2001] [linux-i686 01:36:08 31-Jan-2001] + +Copyright (C) 1996-2000 Washington University, Saint Louis, Missouri USA. +All Rights Reserved. + +Reference: Gish, W. (1996-2000) http://blast.wustl.edu + +Query= gi|1786183|gb|AAC73113.1| (AE000111) aspartokinase I, homoserine + dehydrogenase I [Escherichia coli] + (820 letters) + +Database: ecoli.aa + 4289 sequences; 1,358,990 total letters. +Searching....10....20....30....40....50....60....70....80....90....100% done + + Smallest + Sum + High Probability +Sequences producing High-scoring Segment Pairs: Score P(N) N + +gb|AAC76922.1| (AE000468) aspartokinase II (contig by hand) 999 0.0e-00 1 + + + +>gb|AAC76922.1| (AE000468) aspartokinase II (contig by hand) + + Length = 810 + + Score = 999 (999.9 bits), Expect = 0.0e-00, P = 0.0e-00 + Identities = 250/810 (31%), Positives = 413/810 (51%) + + +Query: 5 KFGGTSVANAERFLRVADILESNARQGQVATVLSAPAKITNHLVAMIEKTISGQDALPNI 64 + KFGG+S+A+ + +LRVA I+ ++ + V+SA TN L+ ++ + + + + + +Sbjct: 16 KFGGSSLADVKCYLRVAGIMAEYSQPDDMM-VVSAAGSTTNQLINWLKLSQTDRLSAHQV 74 + +Query: 65 SDAERIF-AELLTGLAAAQPGFPLAQLKTFVDQEFAQIKHVLH-GISLLGQCPDSINAAL 122 + R + +L++GL A+ L + FV + ++ +L GI+ D++ A + +Sbjct: 75 QQTLRRYQCDLISGLLPAEEADSL--ISAFVS-DLERLAALLDSGIN------DAVYAEV 125 + +Query: 123 ICRGEKMSIAIMAGVLEARGHNVTVIDPVEKLLAVGHYLESTVDIAESTRRIAASRI--P 180 + + GE S +M+ VL +G +D E L A + VD S + + P +Sbjct: 126 VGHGEVWSARLMSAVLNQQGLPAAWLDAREFLRAE-RAAQPQVDEGLSYPLLQQLLVQHP 184 + +Query: 181 ADHMVLMAGFTAGNEKGELVVLGRNGSDYSAAVLAACLRADCCEIWTDVDGVYTCDPRQV 240 + +V+ GF + N GE V+LGRNGSDYSA + A IW+DV GVY+ DPR+V +Sbjct: 185 GKRLVV-TGFISRNNAGETVLLGRNGSDYSATQIGALAGVSRVTIWSDVAGVYSADPRKV 243 + +Query: 241 PDARLLKSMSYQEAMELSYFGAKVLHPRTITPIAQFQIPCLIKNTGNPQAPGTLIGASRD 300 + DA LL + EA EL+ A VLH RT+ P++ +I ++ + P T I +Sbjct: 244 KDACLLPLLRLDEASELARLAAPVLHARTLQPVSGSEIDLQLRCSYTPDQGSTRIERVLA 303 + +Query: 301 EDELPVKGISNLNNMAMFSVSGPGMKGMVGMAARVFAAMSRARISVVLITQSSSEYSISF 360 + + +++ +++ + P + + + RA++ + + + + F +Sbjct: 304 SGT-GARIVTSHDDVCLIEFQVPASQDFKLAHKEIDQILKRAQVRPLAVGVHNDRQLLQF 362 + +Query: 361 CVPQSDCVRAERAMQEEFYLELKEGLLEPLAVTERLAIISVVGDGMRTLRGIS 413 + C A + + E GL L + + LA++++VG G+ T + +Sbjct: 363 CYTSEVADSALKILDEA-------GLPGELRLRQGLALVAMVGAGV-TRNPLH 407 + +Query: 414 A-KFFAALARANINIVAIAQGSSERSISVVVNNDDATTGVRVTHQMLFNTDQVIEVFVIG 472 + +F+ L + Q S+ V+ + ++ HQ +F ++ I + + G +Sbjct: 408 CHRFWQQLKGQPVEFTW--QSDDGISLVAVLRTGPTESLIQGLHQSVFRAEKRIGLVLFG 465 + +Query: 473 VGGVGGALLEQLKRQQSWLKNKH-IDLRVCGVANSKALLTNVHGLN----LENWQEELAQ 527 + G +G LE R+QS L + + + GV +S+ L + GL+ L + +E + +Sbjct: 466 KGNIGSRWLELFAREQSTLSARTGFEFVLAGVVDSRRSLLSYDGLDASRALAFFNDEAVE 525 + +Query: 528 AKEPFNLGRLIRLVKEYHLLNPVIVDCTSSQAVADQYADFLREGFHVVTPNKKANTSSMD 587 + E L ++ + + V++D T+SQ +ADQY DF GFHV++ NK A S + +Sbjct: 526 QDEE----SLFLWMRAHPYDDLVVLDVTASQQLADQYLDFASHGFHVISANKLAGASDSN 581 + +Query: 588 YYHQLRYAAEKSRRKFLYDTNVGAGLPVIENLQNLLNAGDELMKFSGILSGSLSYIFGKL 647 + Y Q+ A EK+ R +LY+ VGAGLP+ +++L+++GD ++ SGI SG+LS++F + +Sbjct: 582 KYRQIHDAFEKTGRHWLYNATVGAGLPINHTVRDLIDSGDTILSISGIFSGTLSWLFLQF 641 + +Query: 648 DEGMSFSEATTLAREMGYTEPDPRDDLSGMDVARKLLILARETGRELELADIEIEPVLPA 707 + D + F+E A + G TEPDPRDDLSG DV RKL+ILARE G +E + +E ++PA +Sbjct: 642 DGSVPFTELVDQAWQQGLTEPDPRDDLSGKDVMRKLVILAREAGYNIEPDQVRVESLVPA 701 + +Query: 708 EFNAEGDVAAFMANLSQLDDLFAARVAKARDEGKVLRYVGNIDEDGVCRVKIAEVDGNDP 767 + G + F N +L++ R+ AR+ G VLRYV D +G RV + V + P +Sbjct: 702 HCEG-GSIDHFFENGDELNEQMVQRLEAAREMGLVLRYVARFDANGKARVGVEAVREDHP 760 + +Query: 768 LFKVKNGENALAFYSHYYQPLPLVLRGYGAGNDVTAAGVFADLLR 812 @@ Diff output truncated at 10000 characters. @@ From maj at dev.open-bio.org Tue May 19 12:38:00 2009 From: maj at dev.open-bio.org (Mark Allen Jensen) Date: Tue, 19 May 2009 12:38:00 -0400 Subject: [Bioperl-guts-l] [15691] bioperl-live/trunk/Bio/Search/SearchUtils.pm: update the secret pod Message-ID: <200905191638.n4JGc02m014846@dev.open-bio.org> Revision: 15691 Author: maj Date: 2009-05-19 12:38:00 -0400 (Tue, 19 May 2009) Log Message: ----------- update the secret pod Modified Paths: -------------- bioperl-live/trunk/Bio/Search/SearchUtils.pm Modified: bioperl-live/trunk/Bio/Search/SearchUtils.pm =================================================================== --- bioperl-live/trunk/Bio/Search/SearchUtils.pm 2009-05-19 16:36:17 UTC (rev 15690) +++ bioperl-live/trunk/Bio/Search/SearchUtils.pm 2009-05-19 16:38:00 UTC (rev 15691) @@ -372,8 +372,9 @@ # Throws : Exceptions propagated from Bio::Search::Hit::BlastHSP::matches() # : for invalid sub-sequence ranges. # Status : Experimental -# Comments : This method does not currently support gapped alignments. -# : Also, it does not keep track of the number of HSPs that +# Comments : This method supports gapped alignments through a patch by maj +# : to B:S:HSP:HSPI::matches(). +# : It does not keep track of the number of HSPs that # : overlap within the amount specified by overlap(). # : This will lead to significant tracking errors for large # : overlap values. From maj at dev.open-bio.org Tue May 19 15:11:56 2009 From: maj at dev.open-bio.org (Mark Allen Jensen) Date: Tue, 19 May 2009 15:11:56 -0400 Subject: [Bioperl-guts-l] [15692] bioperl-live/trunk/t/SearchIO/blast.t: bye-bye cruft Message-ID: <200905191911.n4JJBu8p015181@dev.open-bio.org> Revision: 15692 Author: maj Date: 2009-05-19 15:11:56 -0400 (Tue, 19 May 2009) Log Message: ----------- bye-bye cruft Modified Paths: -------------- bioperl-live/trunk/t/SearchIO/blast.t Modified: bioperl-live/trunk/t/SearchIO/blast.t =================================================================== --- bioperl-live/trunk/t/SearchIO/blast.t 2009-05-19 16:38:00 UTC (rev 15691) +++ bioperl-live/trunk/t/SearchIO/blast.t 2009-05-19 19:11:56 UTC (rev 15692) @@ -4,7 +4,6 @@ use strict; BEGIN { - chdir("c:/cygwin/usr/local/lib/perl5/bioperl-trunk"); use lib '.'; use Bio::Root::Test; From maj at dev.open-bio.org Tue May 19 15:24:02 2009 From: maj at dev.open-bio.org (Mark Allen Jensen) Date: Tue, 19 May 2009 15:24:02 -0400 Subject: [Bioperl-guts-l] [15693] bioperl-dev/trunk: Hit tiling object + new algorithm Message-ID: <200905191924.n4JJO2fU015212@dev.open-bio.org> Revision: 15693 Author: maj Date: 2009-05-19 15:24:02 -0400 (Tue, 19 May 2009) Log Message: ----------- Hit tiling object + new algorithm Modified Paths: -------------- bioperl-dev/trunk/Bio/DB/HIV/HIVXmlSchema.pm Added Paths: ----------- bioperl-dev/trunk/Bio/Search/ bioperl-dev/trunk/Bio/Search/Tiling/ bioperl-dev/trunk/Bio/Search/Tiling/MapTileUtils.pm bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm bioperl-dev/trunk/Bio/Search/Tiling/TilingI.pm bioperl-dev/trunk/t/SearchIO/ bioperl-dev/trunk/t/SearchIO/Tiling.t Modified: bioperl-dev/trunk/Bio/DB/HIV/HIVXmlSchema.pm =================================================================== --- bioperl-dev/trunk/Bio/DB/HIV/HIVXmlSchema.pm 2009-05-19 19:11:56 UTC (rev 15692) +++ bioperl-dev/trunk/Bio/DB/HIV/HIVXmlSchema.pm 2009-05-19 19:24:02 UTC (rev 15693) @@ -155,6 +155,147 @@ use Bio::Phylo::Factory; use constant NEXML => 'http://www.nexml.org/1.0'; +=head2 make_nexml_from_query_s + + Title : make_nexml_from_query_s + Usage : $db->make_nexml_from_query_s( $hiv_query_object ) + Function: Create a NeXML-compliant XML document containing + sequences (not annotations; see + Bio::DB::Query::HIV::make_XML_with_ids() + for that) associated with a Bio::DB::Query::HIVQuery + object + Example : + Returns : NeXML-compliant XML document as string + Args : Bio::DB::Query::HIVQuery object; [optional] array of + LANL sequence ids. + Notes : Requires Rutger Vos' external package Bio::Phylo. + This version of make_nexml_from_query is implemented + as a "write stream" from Bio::Phylo-produced NeXML + into a DOM object under XML::LibXML. Each Bio::Phylo + taxon and datum object produced from each sequence + read from the Bio::SeqIO stream is converted into + NeXML separately, converted to XML::LibXML + elements, and added directly under the DOM nodes + tagged 'otus' and 'matrix' respectively. This may + avoid memory problems I faced when building the + entire NEXUS representation first in Bio::Phylo, + then writing to a XML::LibXML document. + +=cut + +sub make_nexml_from_query_s { + my ($self, at args) = @_; + my ($q) = @args; + + my $bpf = Bio::Phylo::Factory->new; + my $seqio = $self->get_Stream_by_query( $q ); + my $dat_obj = $bpf->create_datum(); + my $taxon_obj = $bpf->create_taxon(); + + my ($mx, $taxa, $alphabet); + my ($xrdr, $dom, $otus_elt, $characters_elt, $matrix_elt); + + my $doc = XML::LibXML::Document->new(); + + # create the DOM, with NexML doc header + $dom = XML::LibXML::Element->new('nexml'); + $dom->setNamespace(NEXML, 'nex'); + $dom->setAttribute('version', '0.8'); + $dom->setAttribute('generator', 'Bio::DB::HIVXmlSchema'); + $dom->setAttribute('xmlns', NEXML); + + # let's try making empty matrix and taxa block nodes within the dom + # to be accessed on the fly: + + my $seq1 = $seqio->next_seq; #peek + $alphabet = $seq1->alphabet; + $mx = $bpf->create_matrix( -type=>$seq1->alphabet ); + $taxa = $bpf->create_taxa(); + # create the linkage + $mx->set_taxa($taxa); + + # create 'otus' and 'characters' elements in the DOM, and get + # refs to the meaty bits + $xrdr = XML::LibXML::Reader->new( string => join('', '', + $taxa->to_xml, + $mx->to_xml(-compact => 1), + '')); + $xrdr->read; + # the (1) argument gets a deep copy of the node... + ( $otus_elt, $characters_elt ) = $xrdr->copyCurrentNode(1)->childNodes; + ($matrix_elt) = $characters_elt->getChildrenByTagName('matrix'); + + # add to the DOM + $dom->addChild($otus_elt); + $dom->addChild($characters_elt); + + + while ( my $seq = $seq1 || $seqio->next_seq ) { + undef $seq1; + + $self->throw( "Mixed data NeXML not currently implemented" ) if + $seq->alphabet ne $alphabet; + + my ($taxon, $datum); + #create elements... + $taxon = $taxon_obj->new( -name => $seq->id, + -desc => $seq->annotation->get_value('Special','accession')); + + $datum = $dat_obj->new_from_bioperl($seq); + #create the link + $taxon->set_data($datum); + # write the new elements into the DOM... + +# no longer using the B:P native facility for insertion +# $taxa->insert( $taxon ); +# $mx->insert( $datum); + + $xrdr = XML::LibXML::Reader->new( string => + join('', '', + $taxon->to_xml, + $datum->to_xml(-compact => 1), + '')); + $xrdr->read; + my ($otu_elt, $row_elt) = $xrdr->copyCurrentNode(1)->childNodes; + + # put the new row in the matrix + $matrix_elt->addChild($row_elt); + + # create the otu element, adding the LANL id and GenBank accn. + # as 'dict' elements... + my ($lanlid, $tmp, $lanlid_elt, $gbaccn_elt); + $lanlid_elt = XML::LibXML::Element->new('dict'); + $lanlid_elt->setAttribute('id', "dict$$lanlid_elt"); # uniquify + + $lanlid = $otu_elt->getAttribute('label'); + $tmp = XML::LibXML::Element->new('string'); + $tmp->setAttribute('id', "LANLSeqId_$$lanlid_elt"); + $tmp->addChild( XML::LibXML::Text->new($lanlid ) ); + $lanlid_elt->addChild($tmp); + + $gbaccn_elt = XML::LibXML::Element->new('dict'); + $gbaccn_elt->setAttribute('id', "dict$$gbaccn_elt"); + + $tmp = XML::LibXML::Element->new('string'); + $tmp->setAttribute('id', "GenBankAccn_$$gbaccn_elt"); + $tmp->addChild( XML::LibXML::Text->new($q->get_accessions_by_id($lanlid))); + $gbaccn_elt->addChild($tmp); + + $otu_elt->addChild($lanlid_elt); + $otu_elt->addChild($gbaccn_elt); + + # put the otu elt in the otus elt + $otus_elt->addChild($otu_elt); + + 1; + } + + $doc->setDocumentElement($dom); + + return $doc->toString(1); + +} + =head2 make_nexml_from_query Title : make_nexml_from_query Added: bioperl-dev/trunk/Bio/Search/Tiling/MapTileUtils.pm =================================================================== --- bioperl-dev/trunk/Bio/Search/Tiling/MapTileUtils.pm (rev 0) +++ bioperl-dev/trunk/Bio/Search/Tiling/MapTileUtils.pm 2009-05-19 19:24:02 UTC (rev 15693) @@ -0,0 +1,225 @@ +#$Id: MapTileUtils.pm 433 2009-05-19 19:19:38Z maj $ +package Bio::Search::Tiling::MapTileUtils; +use strict; +use warnings; +use Exporter; + +BEGIN { + our @ISA = qw( Exporter ); + our @EXPORT = qw( get_intervals_from_hsps interval_tiling decompose_interval ); +} + +# tiling trials +# assumed: intervals are [$a0, $a1], with $a0 <= $a1 +=head1 NAME + +Bio::Search::Tiling::MapTileUtils - utilities for manipulating closed intervals for an HSP tiling algorithm + +=head1 SYNOPSIS + +Not used directly. + +=head1 DESCRIPTION + +=head1 NOTE + +An "interval" in this module is defined as an arrayref C<[$a0, $a1]>, where +C<$a0, $a1> are scalar numbers satisfying C<$a0 E= $a1>. + +=head1 AUTHOR + +Mark A. Jensen + +=head1 APPENDIX + +=head2 interval_tiling + + Title : interval_tiling() + Usage : @tiling = interval_tiling( \@array_of_intervals ) + Function: Find minimal set of intervals covering the input set + Returns : array of arrayrefs of the form + ( [$interval => [ @indices_of_collapsed_input_intervals ]], ...) + Args : arrayref of intervals + +=cut + +sub interval_tiling { + return unless $_[0]; # no input + my $n = scalar @{$_[0]}; + my %input; + @input{(0..$n-1)} = @{$_[0]}; + my @active = (0..$n-1); + my @hold; + my @tiled_ints; + my @ret; + while (@active) { + my $tgt = $input{my $tgt_i = shift @active}; + push @tiled_ints, $tgt_i; + my $tgt_not_disjoint = 1; + while ($tgt_not_disjoint) { + $tgt_not_disjoint = 0; + while (my $try_i = shift @active) { + my $try = $input{$try_i}; + if ( !are_disjoint($tgt, $try) ) { + $tgt = min_covering_interval($tgt,$try); + push @tiled_ints, $try_i; + $tgt_not_disjoint = 1; + } + else { + push @hold, $try_i; + } + } + if (!$tgt_not_disjoint) { + push @ret, [ $tgt => [@tiled_ints] ]; + @tiled_ints = (); + } + @active = @hold; + @hold = (); + } + } + return @ret; +} + +=head2 decompose_interval + + Title : decompose_interval + Usage : @decomposition = decompose_interval( \@overlappers ) + Function: Calculate the disjoint decomposition of a set of + overlapping intervals, each annotated with a list of + covering intervals + Returns : array of arrayrefs of the form + ( [[@interval] => [@indices_of_coverers]], ... ) + Args : arrayref of intervals (arrayrefs like [$a0, $a1], with + Note : Each returned interval is associated with a list of indices of the + original intervals that cover that decomposition component + (scalar size of this list could be called the 'coverage coefficient') + Note : Coverage: each component of the decomp is completely contained + in the input intervals that overlap it, by construction. + Caveat : This routine expects the members of @overlappers to overlap, + but doesn't check this. + +=cut + +### what if the input intervals don't overlap?? They MUST overlap; that's +### what interval_tiling() is for. + +sub decompose_interval { + return unless $_[0]; # no input + my @ints = @{$_[0]}; + my (%flat, at flat); + ### this is ok, but need to handle the case where a lh and rh endpoint + ### coincide... + # decomposition -- + # flatten: + # every lh endpoint generates (lh-1, lh) + # every rh endpoint generates (rh, rh+) + foreach (@ints) { + $flat{$$_[0]-1}++; + $flat{$$_[0]}++; + $flat{$$_[1]}++; + $flat{$$_[1]+1}++; + } + # sort, create singletons if nec. + my @a; + @a = sort {$a<=>$b} keys %flat; + # throw out first and last (meeting a boundary condition) + shift @a; pop @a; + # look for singletons + @flat = (shift @a, shift @a); + if ( $flat[1]-$flat[0] == 1 ) { + @flat = ($flat[0],$flat[0], $flat[1]); + } + while (my $a = shift @a) { + if ($a-$flat[-2]==2) { + push @flat, $flat[-1]; # create singleton interval + } + push @flat, $a; + } + # component intervals are consecutive pairs + my @decomp; + while (my $a = shift @flat) { + push @decomp, [$a, shift @flat]; + } + + # for each component, return a list of the indices of the input intervals + # that cover the component. + my @coverage; + foreach my $i (0..$#decomp) { + foreach my $j (0..$#ints) { + unless (are_disjoint($decomp[$i], $ints[$j])) { + if (defined $coverage[$i]) { + push @{$coverage[$i]}, $j; + } + else { + $coverage[$i] = [$j]; + } + } + } + } @@ Diff output truncated at 10000 characters. @@ From cjfields at dev.open-bio.org Tue May 19 15:27:35 2009 From: cjfields at dev.open-bio.org (Christopher John Fields) Date: Tue, 19 May 2009 15:27:35 -0400 Subject: [Bioperl-guts-l] [15694] bioperl-live/trunk/Bio/Search: initial crack at parsing Infernal 1. 0 output Message-ID: <200905191927.n4JJRZZx015243@dev.open-bio.org> Revision: 15694 Author: cjfields Date: 2009-05-19 15:27:35 -0400 (Tue, 19 May 2009) Log Message: ----------- initial crack at parsing Infernal 1.0 output Modified Paths: -------------- bioperl-live/trunk/Bio/Search/HSP/ModelHSP.pm bioperl-live/trunk/Bio/SearchIO/infernal.pm Modified: bioperl-live/trunk/Bio/Search/HSP/ModelHSP.pm =================================================================== --- bioperl-live/trunk/Bio/Search/HSP/ModelHSP.pm 2009-05-19 19:24:02 UTC (rev 15693) +++ bioperl-live/trunk/Bio/Search/HSP/ModelHSP.pm 2009-05-19 19:27:35 UTC (rev 15694) @@ -76,6 +76,7 @@ package Bio::Search::HSP::ModelHSP; use strict; use Bio::Seq::Meta; +use Data::Dumper; use base qw(Bio::Search::HSP::GenericHSP); @@ -226,6 +227,7 @@ } require Bio::LocatableSeq; my $id = $seqType =~ /^q/i ? $self->query->seq_id : $self->hit->seq_id; + $str =~ s{\*\[\s*(\d+)\s*\]\*}{$1 x 'N'}ge; my $seq = Bio::LocatableSeq->new (-ID => $id, -START => $self->start($seqType), -END => $self->end($seqType), @@ -355,13 +357,22 @@ require Bio::LocatableSeq; require Bio::SimpleAlign; my $aln = Bio::SimpleAlign->new; - my $hs = $self->hit_string(); - my $qs = $self->query_string(); - if (!$qs) { + my %hsp = (hit => $self->hit_string, + midline => $self->homology_string, + query => $self->query_string, + meta => $self->meta); + + # this takes care of infernal issues + if ($hsp{meta} && $hsp{meta} =~ m{~+}) { + $self->_postprocess_hsp(\%hsp); + } + + if (!$hsp{query}) { $self->warn("Missing query string, can't build alignment"); return; } - my $seqonly = $qs; + + my $seqonly = $hsp{query}; $seqonly =~ s/[\-\s]//g; my ($q_nm,$s_nm) = ($self->query->seq_id(), $self->hit->seq_id()); @@ -371,23 +382,23 @@ unless( defined $s_nm && CORE::length ($s_nm) ) { $s_nm = 'hit'; } - my $query = Bio::LocatableSeq->new('-seq' => $qs, + my $query = Bio::LocatableSeq->new('-seq' => $hsp{query}, '-id' => $q_nm, '-start' => $self->query->start, '-end' => $self->query->end, ); - $seqonly = $hs; + $seqonly = $hsp{hit}; $seqonly =~ s/[\-\s]//g; - my $hit = Bio::LocatableSeq->new('-seq' => $hs, + my $hit = Bio::LocatableSeq->new('-seq' => $hsp{hit}, '-id' => $s_nm, '-start' => $self->hit->start, '-end' => $self->hit->end, ); $aln->add_seq($query); $aln->add_seq($hit); - if ($self->meta) { + if ($hsp{meta}) { my $meta_obj = Bio::Seq::Meta->new(); - $meta_obj->named_meta('ss_cons', $self->meta); + $meta_obj->named_meta('ss_cons', $hsp{meta}); $aln->consensus_meta($meta_obj); } return $aln; @@ -568,4 +579,42 @@ return; } +############## PRIVATE ############## + +# the following method postprocesses HSP data in cases where the sequences +# aren't complete (which can trigger a validation error) + +{ + my $SEQ_REGEX = qr/\*\[\s*(\d+)\s*\]\*/; + my $META_REGEX = qr/(~+)/; + +sub _postprocess_hsp { + my ($self, $hsp) = @_; + $self->throw('Must pass a hash ref for HSP processing') unless ref($hsp) eq 'HASH'; + my @ins; + for my $type (qw(query hit meta)) { + my $str = $hsp->{$type}; + my $regex = $type eq 'meta' ? $META_REGEX : $SEQ_REGEX; + my $ind = 0; + while ($str =~ m{$regex}g) { + $ins[$ind]->{$type} = {pos => pos($str) - length($1), str => $1}; + } + } + for my $chunk (reverse @ins) { + my ($max, $min) = ($chunk->{hit}->{str} >= $chunk->{query}->{str}) ? + ('hit', 'query') : ('query', 'hit'); + my %rep; + $rep{$max} = 'N' x $chunk->{$max}->{str}; + $rep{$min} = 'N' x $chunk->{$min}->{str}. + ('-'x($chunk->{$max}->{str}-$chunk->{$min}->{str})); + $rep{'meta'} = '~' x $chunk->{$max}->{str}; + $rep{'midline'} = ' ' x $chunk->{$max}->{str}; + for my $t (qw(hit query meta midline)) { + substr($hsp->{$t}, $chunk->{meta}->{pos}, length($chunk->{meta}->{str}) , $rep{$t}); + } + } +} + +} + 1; Modified: bioperl-live/trunk/Bio/SearchIO/infernal.pm =================================================================== --- bioperl-live/trunk/Bio/SearchIO/infernal.pm 2009-05-19 19:24:02 UTC (rev 15693) +++ bioperl-live/trunk/Bio/SearchIO/infernal.pm 2009-05-19 19:27:35 UTC (rev 15694) @@ -31,13 +31,15 @@ =head1 DESCRIPTION -This is a highly experimental SearchIO-based parser for Infernal -output from the cmsearch program. It currently parses cmsearch output -for Infernal versions 0.7-0.81; older versions may work but will not -be supported. After the first stable version is released (and output -has stabilized) it is very likely support for the older pre-v.1 -developer releases will be dropped. +This is a SearchIO-based parser for Infernal output from the cmsearch program. +It currently parses cmsearch output for Infernal versions 0.7-1.0; older +versions may work but will not be supported. +As the first stable version has been released (and output has stabilized) it is +highly recommended that users upgrade to using the latest Infernal release. +Support for the older pre-v.1 developer releases will be dropped for future core +1.6 releases. + =head1 FEEDBACK =head2 Mailing Lists @@ -138,7 +140,7 @@ my $MINSCORE = 0; my $DEFAULT_ALGORITHM = 'cmsearch'; -my $DEFAULT_VERSION = '0.72'; +my $DEFAULT_VERSION = '1.0'; my @VALID_SYMBOLS = qw(5-prime 3-prime single-strand unknown gap); my %STRUCTURE_SYMBOLS = ( @@ -243,16 +245,26 @@ # advance to first line next if $line =~ m{^\s*$}; # newer output starts with model name - if ($line =~ m{^CM\s\d+:}) { - $self->{'_handlerset'} = 'new'; + if ($line =~ m{^\#\s+cmsearch\s}) { + $self->{'_handlerset'} = 'latest'; + } elsif ($line =~ m{^CM\s\d+:}) { + $self->{'_handlerset'} = 'pre-1.0'; } else { $self->{'_handlerset'} ='old'; } last; } $self->_pushback($line); + #if ($self->{'_handlerset'} ne '1.0') { + # $self->deprecated( + # -message => "Parsing of Infernal pre-1.0 release is deprecated;\n". + # "upgrading to Infernal 1.0 or above is highly recommended", + # -version => 1.007); + #} } - return $self->{'_handlerset'} eq 'new' ? $self->_parse_new : $self->_parse_old; + return ($self->{'_handlerset'} eq 'latest') ? $self->_parse_latest : + ($self->{'_handlerset'} eq 'pre-1.0') ? $self->_parse_pre : + $self->_parse_old; } =head2 start_element @@ -295,7 +307,6 @@ Returns : none Args : hashref with at least 2 keys, 'Data' and 'Name' - =cut sub end_element { @@ -686,11 +697,210 @@ # this is a hack which guesses the format and sets the handler for parsing in # an instance; it'll be taken out when infernal 1.0 is released -# cmsearch 0.81 -sub _parse_new { +sub _parse_latest { my ($self) = @_; my $seentop = 0; local $/ = "\n"; + my ($accession, $description) = ($self->query_accession, $self->query_description); + my ($maxscore, $mineval, $minpval); + $self->start_document(); + my ($lasthit, $lastscore, $lasteval, $lastpval, $laststart, $lastend); + PARSER: + while (my $line = $self->_readline) { + next if $line =~ m{^\s+$}; + # stats aren't parsed yet... + if ($line =~ m{^\#\s+cmsearch}xms) { + $seentop = 1; + $self->start_element({'Name' => 'Result'}); + $self->element_hash({ + 'Infernal_program' => 'CMSEARCH' + }); + } + elsif ($line =~ m{^\#\sINFERNAL\s+(\d+\.\d+)}xms) { + $self->element_hash({ + 'Infernal_version' => $1, + }); + } + elsif ($line =~ m{^\#\scommand:.*?\s(\S+)$}xms) { + $self->element_hash({ + 'Infernal_db' => $1, + }); + } + elsif ($line =~ m{^\#\s+dbsize\(Mb\):\s+(\d+\.\d+)}xms) { + # store absolute DB length + $self->element_hash({ + 'Infernal_db-let' => $1 * 1e6 + }); + } + elsif ($line =~ m{^CM(?:\s(\d+))?:\s*(\S+)}xms) { + # not sure, but it's possible single reports may contain multiple + # models; if so, they should be rolled over into a new ResultI + #print STDERR "ACC: $accession\nDESC: $description\n"; + $self->element_hash({ + 'Infernal_query-def' => $2, # present in output now + 'Infernal_query-acc' => $accession, + 'Infernal_querydesc' => $description + }); + } + elsif ($line =~ m{^>\s*(\S+)} ){ + #$self->debug("Start Hit: Found hit:$1\n"); + if ($self->in_element('hit')) { + $self->element_hash({'Hit_score' => $maxscore, + 'Hit_bits' => $maxscore}); + ($maxscore, $minpval, $mineval) = undef; + $self->end_element({'Name' => 'Hit'}); + } + $lasthit = $1; + } + elsif ($line =~ m{ + ^\sQuery\s=\s\d+\s-\s\d+,\s # Query start/end + Target\s=\s(\d+)\s-\s(\d+) # Target start/end + }xmso) { + # Query (model) start/end always the same, determined from + # the HSP length + ($laststart, $lastend) = ($1, $2); + #$self->debug("Found hit coords:$laststart - $lastend\n"); + } elsif ($line =~ m{ + ^\sScore\s=\s([\d\.]+),\s # Score = Bitscore (for now) + (?:E\s=\s([\d\.e-]+),\s # E-val optional + P\s=\s([\d\.e-]+),\s)? # P-val optional + GC\s= # GC not captured + }xmso + ) { + ($lastscore, $lasteval, $lastpval) = ($1, $2, $3); + #$self->debug(sprintf("Found hit data:Score:%s,Eval:%s,Pval:%s\n",$lastscore, $lasteval||'', $lastpval||'')); + $maxscore ||= $lastscore; + if ($lasteval && $lastpval) { + $mineval ||= $lasteval; + $minpval ||= $lastpval; + $mineval = ($mineval > $lasteval) ? $lasteval : + $mineval; @@ Diff output truncated at 10000 characters. @@ From cjfields at dev.open-bio.org Tue May 19 16:05:44 2009 From: cjfields at dev.open-bio.org (Christopher John Fields) Date: Tue, 19 May 2009 16:05:44 -0400 Subject: [Bioperl-guts-l] [15695] bioperl-live/trunk/t: add some tests (some still failing, be forewarned) Message-ID: <200905192005.n4JK5iAU015394@dev.open-bio.org> Revision: 15695 Author: cjfields Date: 2009-05-19 16:05:44 -0400 (Tue, 19 May 2009) Log Message: ----------- add some tests (some still failing, be forewarned) Modified Paths: -------------- bioperl-live/trunk/t/SearchIO/infernal.t Added Paths: ----------- bioperl-live/trunk/t/data/test2.infernal Modified: bioperl-live/trunk/t/SearchIO/infernal.t =================================================================== --- bioperl-live/trunk/t/SearchIO/infernal.t 2009-05-19 19:27:35 UTC (rev 15694) +++ bioperl-live/trunk/t/SearchIO/infernal.t 2009-05-19 20:05:44 UTC (rev 15695) @@ -7,16 +7,251 @@ use lib '.'; use Bio::Root::Test; - test_begin(-tests => 316); + test_begin(-tests => 412); use_ok('Bio::SearchIO'); } my ($searchio, $result, $iter, $hit, $hsp, $algorithm, $meta); -### Infernal #### +### Infernal v. 1.0 #### $searchio = Bio::SearchIO->new( -format => 'infernal', + -file => test_input_file('test2.infernal'), + -hsp_minscore => 40, + -verbose => 1 + # version is reset to the correct one by parser + -model => 'Foo', + -query_acc => 'RF01234', + -query_desc => 'tRNA', + #-convert_meta => 0, + ); + +$result = $searchio->next_result; +isa_ok($result, 'Bio::Search::Result::ResultI'); +is($result->algorithm, 'CMSEARCH', "Result"); +is($result->algorithm_reference, undef, "Result reference"); +is($result->algorithm_version, '1.0', "Result version"); +is($result->available_parameters, 0, "Result parameters"); +is($result->available_statistics, 0, "Result statistics"); +is($result->database_entries, '', "Result entries"); +is($result->database_letters, 600000, "Result letters"); +is($result->database_name, 'tosearch.300Kb.db', + "Result database_name"); +is($result->num_hits, 1, "Result num_hits"); +is($result->program_reference, undef, "Result program_reference"); +is($result->query_accession, 'RF01234', "Result query_accession"); +is($result->query_description, 'my RNA ', "Result query_description"); +is($result->query_length, 72, "Result query_length"); +is($result->query_name, 'trna.5-1', "Result query_name"); + +$hit = $result->next_hit; + +isa_ok($hit, 'Bio::Search::Hit::HitI'); +is($hit->ncbi_gi, '', "Hit GI"); +is($hit->accession, 'example', "Hit accession"); +is($hit->algorithm, 'CMSEARCH', "Hit algorithm"); +is($hit->bits, '78.06', "Hit bits"); +is($hit->description, '', "Hit description"); # no hit descs yet +is($hit->locus, '', "Hit locus"); +is($hit->n, 3, "Hit n"); +is($hit->name, 'example', "Hit name"); +is($hit->num_hsps, 3, "Hit num_hsps"); + +# These Bio::Search::Hit::HitI methods are currently unimplemented in +# Bio::Search::Hit::ModelHit; they may be integrated over time but will require +# some reconfiguring for Model-based searches + +# these need to be replaced by dies_ok() or warnings_like() +warning_like { $hit->length_aln() } + qr'length_aln not implemented for Model-based searches', + "Hit length_aln() not implemented"; +warning_like {$hit->num_unaligned_hit} + qr'num_unaligned_hit/num_unaligned_sbjct not implemented for Model-based searches', + "Hit num_unaligned_hit() not implemented"; +warning_like {$hit->num_unaligned_query} + qr'num_unaligned_query not implemented for Model-based searches', + "Hit num_unaligned_query() not implemented"; +warning_like {$hit->num_unaligned_sbjct} + qr'num_unaligned_hit/num_unaligned_sbjct not implemented for Model-based searches', + "Hit num_unaligned_sbjct() not implemented"; +warning_like {$hit->start} + qr'start not implemented for Model-based searches', + 'Hit start not implemented'; +warning_like {$hit->end} + qr'end not implemented for Model-based searches', + 'Hit end not implemented'; +warning_like {$hit->strand} + qr'strand not implemented for Model-based searches', + 'Hit strand not implemented'; +warning_like {$hit->logical_length} + qr'logical_length not implemented for Model-based searches', + 'Hit logical_length not implemented'; +warning_like {$hit->frac_aligned_hit} + qr'frac_aligned_hit not implemented for Model-based searches', + 'Hit frac_aligned_hit not implemented'; +warning_like {$hit->frac_aligned_query} + qr'frac_aligned_query not implemented for Model-based searches', + 'Hit frac_aligned_query not implemented'; +warning_like {$hit->frac_conserved} + qr'frac_conserved not implemented for Model-based searches', + 'Hit frac_conserved not implemented'; +warning_like {$hit->frac_identical} + qr'frac_identical not implemented for Model-based searches', + 'Hit frac_identical not implemented'; +warning_like {$hit->matches} + qr'matches not implemented for Model-based searches', + 'Hit matches not implemented'; +warning_like {$hit->gaps} + qr'gaps not implemented for Model-based searches', + 'Hit gaps not implemented'; +warning_like {$hit->frame} + qr'frame not implemented for Model-based searches', + 'Hit frame not implemented'; +warning_like {$hit->range} + qr'range not implemented for Model-based searches', + 'Hit range not implemented'; +warning_like {$hit->seq_inds} + qr'seq_inds not implemented for Model-based searches', + 'Hit seq_inds not implemented'; + +is($hit->length, 0, "Hit length"); +is($hit->overlap, 0, "Hit overlap"); +is($hit->query_length, 72, "Hit query_length"); +is($hit->rank, 1, "Hit rank"); +is($hit->raw_score, '78.06', "Hit raw_score"); +is($hit->score, '78.06', "Hit score"); +float_is($hit->p, '11.10', "Hit p"); +float_is($hit->significance, '3.133e-21'); + +$hsp = $hit->next_hsp; +isa_ok($hsp, 'Bio::Search::HSP::HSPI'); +is($hsp->algorithm, 'CMSEARCH', "HSP algorithm"); +float_is($hsp->evalue, '3.133e-21'); +isa_ok($hsp->feature1, 'Bio::SeqFeature::Similarity'); +isa_ok($hsp->feature2, 'Bio::SeqFeature::Similarity'); +($meta) = $hsp->feature1->get_tag_values('meta'); +is($meta, '(((((((,,<<<<___.____>>>>,<<<<<_______>>>>>,,,,,<<<<<_______>>>>>))))))):'); +($meta) = $hsp->feature2->get_tag_values('meta'); +is($meta, '(((((((,,<<<<___.____>>>>,<<<<<_______>>>>>,,,,,<<<<<_______>>>>>))))))):'); + +is($hsp->frame('query'), 0, "HSP frame"); +is($hsp->gaps, 0, "HSP gaps"); +is($hit->length, 0, "Hit length"); +isa_ok($hsp->get_aln, 'Bio::Align::AlignI'); +isa_ok($hsp->hit, 'Bio::SeqFeature::Similarity', "HSP hit"); +is($hsp->hit_string, + 'GCGGAUUUAGCUCAGUuGGGAGAGCGCCAGACUGAAGAUCUGGAGGUCCUGUGUUCGAUCCACAGAAUUCGCA', + "HSP hit_string"); +is($hsp->homology_string, + 'GC::A::UAGC:CAGU GG AG:GCGCCAG:CUG+++A:CUGGAGGUCC:G:GUUCGAU C:C:G::U::GCA', + "HSP homology_string"); +is($hsp->hsp_group, undef, "HSP hsp_group"); +is($hsp->hsp_length, 73, "HSP hsp_length"); +is($hsp->length, 73, "HSP length"); +is($hsp->links, undef, "HSP links"); +is($hsp->n, '', "HSP n"); +float_is($hsp->pvalue, 2.906e-26, "HSP pvalue"); +isa_ok($hsp->query, 'Bio::SeqFeature::Similarity', "HSP query"); +is($hsp->query_string, + 'gCcgacAUaGcgcAgU.GGuAgcgCgccagccUgucAagcuggAGgUCCgggGUUCGAUuCcccGUgucgGca', + "HSP query_string"); +is($hsp->range, 72, "HSP range"); +is($hsp->rank, 1, "HSP rank"); +float_is($hsp->significance, 3.133e-21); +is($hsp->end, 72, "HSP end"); +float_is($hsp->expect, '3.133e-21', "HSP expect"); + +# These Bio::Search::HSP::HSPI methods are currently unimplemented in +# Bio::Search::HSP::ModelHSP; they may be integrated over time but will require +# some reconfiguring for Model-based searches + +warning_like {$hsp->seq_inds} + qr'seq_inds not implemented for Model-based searches', + 'HSP seq_inds not implemented'; +warning_like {$hsp->matches} + qr'matches not implemented for Model-based searches', + 'HSP matches not implemented'; +warning_like {$hsp->frac_conserved} + qr'frac_conserved not implemented for Model-based searches', + 'HSP frac_conserved not implemented'; +warning_like {$hsp->frac_identical} + qr'frac_identical not implemented for Model-based searches', + 'HSP frac_identical not implemented'; +warning_like {$hsp->num_conserved} + qr'num_conserved not implemented for Model-based searches', + 'HSP num_conserved not implemented'; +warning_like {$hsp->num_identical} + qr'num_identical not implemented for Model-based searches', + 'HSP num_identical not implemented'; +warning_like {$hsp->percent_identity} + qr'percent_identity not implemented for Model-based searches', + 'HSP percent_identity not implemented'; +warning_like {$hsp->cigar_string} + qr'cigar_string not implemented for Model-based searches', + 'HSP cigar_string not implemented'; +warning_like {$hsp->generate_cigar_string} + qr'generate_cigar_string not implemented for Model-based searches', + 'HSP cigar_string not implemented'; + +isa_ok($hsp->seq, 'Bio::LocatableSeq'); +is($hsp->seq_str, + 'gCcgacAUaGcgcAgU.GGuAgcgCgccagccUgucAagcuggAGgUCCgggGUUCGAUuCcccGUgucgGca', + "HSP seq_str"); +is($hsp->start, 1, "HSP start"); +is($hsp->custom_score, undef, "HSP custom_score"); +is($hsp->meta, + '(((((((,,<<<<___.____>>>>,<<<<<_______>>>>>,,,,,<<<<<_______>>>>>))))))):', + "HSP meta"); +is($hsp->strand('hit'), 1, "HSP strand"); + +$hsp = $hit->next_hsp; +isa_ok($hsp, 'Bio::Search::HSP::HSPI'); +is($hsp->algorithm, 'CMSEARCH', "HSP algorithm"); +float_is($hsp->evalue, 0.6752); +isa_ok($hsp->feature1, 'Bio::SeqFeature::Similarity'); +isa_ok($hsp->feature2, 'Bio::SeqFeature::Similarity'); +is($hsp->frame('query'), 0, "HSP frame"); +is($hsp->gaps, 4, "HSP gaps"); +# infernal can return alignment data +isa_ok($hsp->get_aln, 'Bio::Align::AlignI'); +isa_ok($hsp->hit, 'Bio::SeqFeature::Similarity', "HSP hit"); +is($hsp->hit_string, + 'UCUGCUAUGGCGUAAUGGCCACGCGC----CCAUCAACAAAGAUAUC*[19]*UAACAGGA', + "HSP hit_string"); +is($hsp->homology_string, + ' C:G :AU+GCG:A+UGG :CGCGC C UCAA +++GA +UC U: C:G A', + "HSP homology_string"); +is($hsp->hsp_group, undef, "HSP hsp_group"); +is($hsp->hsp_length, 73, "HSP hsp_length"); +is($hsp->length, 73, "HSP length"); +is($hsp->links, undef, "HSP links"); +is($hsp->n, '', "HSP n"); +float_is($hsp->pvalue, 6.263e-06, "HSP pvalue"); +isa_ok($hsp->query, 'Bio::SeqFeature::Similarity', "HSP query"); +is($hsp->query_string, + 'gCcgacAUaGcgcAgUGGuAgcgCgccagccUgucAagcuggAGgUC*[17]*UgucgGca', + "HSP query_string"); +is($hsp->range, 72, "HSP range"); @@ Diff output truncated at 10000 characters. @@ From cjfields at dev.open-bio.org Tue May 19 16:42:10 2009 From: cjfields at dev.open-bio.org (Christopher John Fields) Date: Tue, 19 May 2009 16:42:10 -0400 Subject: [Bioperl-guts-l] [15696] bioperl-live/trunk/Bio/Search/Hit: * remove redundant code Message-ID: <200905192042.n4JKgAN4015689@dev.open-bio.org> Revision: 15696 Author: cjfields Date: 2009-05-19 16:42:09 -0400 (Tue, 19 May 2009) Log Message: ----------- * remove redundant code * p() should default to significance, not expect Modified Paths: -------------- bioperl-live/trunk/Bio/Search/Hit/GenericHit.pm bioperl-live/trunk/Bio/Search/Hit/ModelHit.pm Modified: bioperl-live/trunk/Bio/Search/Hit/GenericHit.pm =================================================================== --- bioperl-live/trunk/Bio/Search/Hit/GenericHit.pm 2009-05-19 20:05:44 UTC (rev 15695) +++ bioperl-live/trunk/Bio/Search/Hit/GenericHit.pm 2009-05-19 20:42:09 UTC (rev 15696) @@ -130,7 +130,7 @@ my $self = $class->SUPER::new(@args); my ($hsps, $name,$query_len,$desc, $acc, $locus, $length, - $score,$algo,$signif,$bits, + $score,$algo,$signif,$bits, $p, $rank, $hsp_factory, $gi) = $self->_rearrange([qw(HSPS NAME QUERY_LEN @@ -138,7 +138,7 @@ ACCESSION LOCUS LENGTH SCORE ALGORITHM - SIGNIFICANCE BITS + SIGNIFICANCE BITS P RANK HSP_FACTORY NCBI_GI)], @args); @@ -162,6 +162,10 @@ defined $rank && $self->rank($rank); defined $hsp_factory && $self->hsp_factory($hsp_factory); defined $gi && $self->ncbi_gi($gi); + # p() has a weird interface, so this is a hack workaround + if (defined $p) { + $self->{_p} = $p; + } $self->{'_iterator'} = 0; if( defined $hsps ) { @@ -631,7 +635,7 @@ : That is, floats are not converted into sci notation before : splitting into parts. -See Also : L, L, L +See Also : L, L, L =cut @@ -645,8 +649,8 @@ if(!defined $val) { # P-value not defined, must be a NCBI Blast2 report. # Use expect instead. - $self->warn( "P-value not defined. Using expect() instead."); - $val = $self->{'_expect'}; + $self->warn( "P-value not defined. Using significance() instead."); + $val = $self->significance(); } return $val if not $fmt or $fmt =~ /^raw/i; Modified: bioperl-live/trunk/Bio/Search/Hit/ModelHit.pm =================================================================== --- bioperl-live/trunk/Bio/Search/Hit/ModelHit.pm 2009-05-19 20:05:44 UTC (rev 15695) +++ bioperl-live/trunk/Bio/Search/Hit/ModelHit.pm 2009-05-19 20:42:09 UTC (rev 15696) @@ -396,28 +396,6 @@ =cut -sub p { - # Some duplication of logic for p(), expect() and signif() for the sake of performance. - my ($self, $fmt) = @_; - - my $val = $self->{'_p'}; - - # $val can be zero. - if(not defined $val) { - # Use expect instead. - $self->warn( "P-value not defined. Using expect() instead."); - $val = $self->{'_expect'}; - } - - return $val if not $fmt or $fmt =~ /^raw/i; - ## Special formats: exponent-only or as list. - return &Bio::Search::SearchUtils::get_exponent($val) if $fmt =~ /^exp/i; - return (split (/eE/, $val)) if $fmt =~ /^parts/i; - - ## Default: return the raw P-value. - return $val; -} - =head2 hsp Usage : $hit_object->hsp( [string] ); From cjfields at dev.open-bio.org Tue May 19 17:00:23 2009 From: cjfields at dev.open-bio.org (Christopher John Fields) Date: Tue, 19 May 2009 17:00:23 -0400 Subject: [Bioperl-guts-l] [15697] bioperl-live/trunk: tests pass, but the code needs some extra tweaking Message-ID: <200905192100.n4JL0NRK015722@dev.open-bio.org> Revision: 15697 Author: cjfields Date: 2009-05-19 17:00:23 -0400 (Tue, 19 May 2009) Log Message: ----------- tests pass, but the code needs some extra tweaking Modified Paths: -------------- bioperl-live/trunk/Bio/Search/HSP/ModelHSP.pm bioperl-live/trunk/Bio/SearchIO/infernal.pm bioperl-live/trunk/t/SearchIO/infernal.t Modified: bioperl-live/trunk/Bio/Search/HSP/ModelHSP.pm =================================================================== --- bioperl-live/trunk/Bio/Search/HSP/ModelHSP.pm 2009-05-19 20:42:09 UTC (rev 15696) +++ bioperl-live/trunk/Bio/Search/HSP/ModelHSP.pm 2009-05-19 21:00:23 UTC (rev 15697) @@ -76,7 +76,6 @@ package Bio::Search::HSP::ModelHSP; use strict; use Bio::Seq::Meta; -use Data::Dumper; use base qw(Bio::Search::HSP::GenericHSP); Modified: bioperl-live/trunk/Bio/SearchIO/infernal.pm =================================================================== --- bioperl-live/trunk/Bio/SearchIO/infernal.pm 2009-05-19 20:42:09 UTC (rev 15696) +++ bioperl-live/trunk/Bio/SearchIO/infernal.pm 2009-05-19 21:00:23 UTC (rev 15697) @@ -124,6 +124,7 @@ 'Hit_accession' => 'HIT-accession', 'Hit_def' => 'HIT-description', 'Hit_signif' => 'HIT-significance', # evalues only in v0.81, optional + 'Hit_p' => 'HIT-p', # pvalues in 1.0, optional 'Hit_score' => 'HIT-score', # best HSP bit score 'Hit_bits' => 'HIT-bits', # best HSP bit score @@ -211,11 +212,12 @@ -verbose => $self->verbose ) ); - $model && $self->model($model); - $database && $self->database($database); - $accession && $self->query_accession($accession); - $convert && $self->convert_meta($convert); - $desc && $self->query_description($desc); + + defined $model && $self->model($model); + defined $database && $self->database($database); + defined $accession && $self->query_accession($accession); + defined $convert && $self->convert_meta($convert); + defined $desc && $self->query_description($desc); $version ||= $DEFAULT_VERSION; $self->version($version); @@ -884,7 +886,8 @@ $self->element_hash({'Hit_score' => $maxscore, 'Hit_bits' => $maxscore}); # don't know where to put minpval yet - $self->element_hash({'Hit_signif' => $mineval}) if $mineval; + $self->element_hash({'Hit_signif' => $mineval}) if $mineval; + $self->element_hash({'Hit_p' => $minpval}) if $minpval; $self->end_element({'Name' => 'Hit'}); } last PARSER; Modified: bioperl-live/trunk/t/SearchIO/infernal.t =================================================================== --- bioperl-live/trunk/t/SearchIO/infernal.t 2009-05-19 20:42:09 UTC (rev 15696) +++ bioperl-live/trunk/t/SearchIO/infernal.t 2009-05-19 21:00:23 UTC (rev 15697) @@ -12,19 +12,15 @@ use_ok('Bio::SearchIO'); } -my ($searchio, $result, $iter, $hit, $hsp, $algorithm, $meta); +my ($result, $iter, $hit, $hsp, $algorithm, $meta); ### Infernal v. 1.0 #### -$searchio = Bio::SearchIO->new( -format => 'infernal', +my $searchio = Bio::SearchIO->new( -format => 'infernal', -file => test_input_file('test2.infernal'), - -hsp_minscore => 40, - -verbose => 1 - # version is reset to the correct one by parser - -model => 'Foo', + -model => 'tRNAtest', -query_acc => 'RF01234', -query_desc => 'tRNA', - #-convert_meta => 0, ); $result = $searchio->next_result; @@ -41,7 +37,7 @@ is($result->num_hits, 1, "Result num_hits"); is($result->program_reference, undef, "Result program_reference"); is($result->query_accession, 'RF01234', "Result query_accession"); -is($result->query_description, 'my RNA ', "Result query_description"); +is($result->query_description, 'tRNA', "Result query_description"); is($result->query_length, 72, "Result query_length"); is($result->query_name, 'trna.5-1', "Result query_name"); @@ -121,7 +117,7 @@ is($hit->rank, 1, "Hit rank"); is($hit->raw_score, '78.06', "Hit raw_score"); is($hit->score, '78.06', "Hit score"); -float_is($hit->p, '11.10', "Hit p"); +float_is($hit->p, '2.906e-26', "Hit p"); float_is($hit->significance, '3.133e-21'); $hsp = $hit->next_hsp; @@ -296,7 +292,7 @@ # p() works but there are no evalues yet for Infernal output, so catch and check... warning_like {$hit->p} - qr'P-value not defined. Using expect\(\) instead', + qr'P-value not defined. Using significance\(\) instead', "No p values"; is($hit->length, 0, "Hit length"); @@ -538,7 +534,6 @@ is($result->query_name, 'Purine', "Result query_name"); $hit = $result->next_hit; -$hit->verbose(2); isa_ok($hit, 'Bio::Search::Hit::HitI'); is($hit->ncbi_gi, '633168', "Hit GI"); is($hit->accession, 'X83878.1', "Hit accession"); @@ -551,9 +546,8 @@ is($hit->num_hsps, 2, "Hit num_hsps"); # p() works but there are no evalues yet for Infernal output, so catch and check... -eval {$hit->p}; -like($@, qr'P-value not defined. Using expect\(\) instead', - "No p values"); +warnings_like {$hit->p} qr'P-value not defined. Using significance\(\) instead', + "No p values"; is($hit->length, 0, "Hit length"); is($hit->overlap, 0, "Hit overlap"); From cjfields at dev.open-bio.org Tue May 19 17:23:10 2009 From: cjfields at dev.open-bio.org (Christopher John Fields) Date: Tue, 19 May 2009 17:23:10 -0400 Subject: [Bioperl-guts-l] [15698] bioperl-live/trunk/Bio/Search: some cleanup and small bug fixes Message-ID: <200905192123.n4JLNA7V015942@dev.open-bio.org> Revision: 15698 Author: cjfields Date: 2009-05-19 17:23:10 -0400 (Tue, 19 May 2009) Log Message: ----------- some cleanup and small bug fixes Modified Paths: -------------- bioperl-live/trunk/Bio/Search/HSP/ModelHSP.pm bioperl-live/trunk/Bio/SearchIO/infernal.pm Modified: bioperl-live/trunk/Bio/Search/HSP/ModelHSP.pm =================================================================== --- bioperl-live/trunk/Bio/Search/HSP/ModelHSP.pm 2009-05-19 21:00:23 UTC (rev 15697) +++ bioperl-live/trunk/Bio/Search/HSP/ModelHSP.pm 2009-05-19 21:23:10 UTC (rev 15698) @@ -226,7 +226,8 @@ } require Bio::LocatableSeq; my $id = $seqType =~ /^q/i ? $self->query->seq_id : $self->hit->seq_id; - $str =~ s{\*\[\s*(\d+)\s*\]\*}{$1 x 'N'}ge; + $str =~ s{\*\[\s*(\d+)\s*\]\*}{'N' x $1}ge; + $str =~ s{\s+}{}g; my $seq = Bio::LocatableSeq->new (-ID => $id, -START => $self->start($seqType), -END => $self->end($seqType), @@ -592,11 +593,13 @@ $self->throw('Must pass a hash ref for HSP processing') unless ref($hsp) eq 'HASH'; my @ins; for my $type (qw(query hit meta)) { + $hsp->{$type} =~ s{\s+$}{}; my $str = $hsp->{$type}; my $regex = $type eq 'meta' ? $META_REGEX : $SEQ_REGEX; my $ind = 0; while ($str =~ m{$regex}g) { $ins[$ind]->{$type} = {pos => pos($str) - length($1), str => $1}; + $ind++; } } for my $chunk (reverse @ins) { Modified: bioperl-live/trunk/Bio/SearchIO/infernal.pm =================================================================== --- bioperl-live/trunk/Bio/SearchIO/infernal.pm 2009-05-19 21:00:23 UTC (rev 15697) +++ bioperl-live/trunk/Bio/SearchIO/infernal.pm 2009-05-19 21:23:10 UTC (rev 15698) @@ -878,10 +878,6 @@ # result now ends with // and 'Fin' } elsif ($line =~ m{^//}xms ) { if ($self->within_element('result') && $seentop) { - #$self->element( - # {'Name' => 'Infernal_version', - # 'Data' => $version} - # ); if ($self->in_element('hit')) { $self->element_hash({'Hit_score' => $maxscore, 'Hit_bits' => $maxscore}); From bugzilla-daemon at portal.open-bio.org Tue May 19 19:28:23 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 19 May 2009 19:28:23 -0400 Subject: [Bioperl-guts-l] [Bug 2830] New: Bio::Search::Tiling development/use case data Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2830 Summary: Bio::Search::Tiling development/use case data Product: BioPerl Version: main-trunk Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: bioperl-dev AssignedTo: bioperl-guts-l at bioperl.org ReportedBy: maj at fortinbras.us Bio::Search::Tiling tries to provide robust facilities for creating contigs and generating accurate hit-wide statistics from sets of high-scoring pairs resulting from BLAST analysis. This "bug" is to provide a convenient and trackable place for users to comment/patch/send example data to put Bio::Search::Tiling through its paces during its development. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue May 19 19:28:52 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 19 May 2009 19:28:52 -0400 Subject: [Bioperl-guts-l] [Bug 2830] Bio::Search::Tiling development/use case data In-Reply-To: Message-ID: <200905192328.n4JNSq5x015891@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2830 maj at fortinbras.us changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue May 19 19:29:55 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 19 May 2009 19:29:55 -0400 Subject: [Bioperl-guts-l] [Bug 2830] Bio::Search::Tiling development/use case data In-Reply-To: Message-ID: <200905192329.n4JNTtlH016005@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2830 maj at fortinbras.us changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bioperl-guts-l at bioperl.org |maj at fortinbras.us Status|ASSIGNED |NEW -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 20 11:37:32 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 20 May 2009 11:37:32 -0400 Subject: [Bioperl-guts-l] [Bug 2831] New: Build.PL appears to break automated testing with CPANPLUS Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2831 Summary: Build.PL appears to break automated testing with CPANPLUS Product: BioPerl Version: main-trunk Platform: All OS/Version: All Status: NEW Severity: minor Priority: P2 Component: Core Components AssignedTo: bioperl-guts-l at bioperl.org ReportedBy: cjfields at bioperl.org CPAN testers is reporting a disproportionate number of 'UNKNOWN' test results that have one issue in common: CPANPLUS. Not sure if this is a true bioperl bug, but this may stem from Bio::Root::Build and reliance on CPAN.pm (i.e. no built-in CPANPLUS support). http://matrix.cpantesters.org/?dist=BioPerl+1.6.0 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 20 11:40:11 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 20 May 2009 11:40:11 -0400 Subject: [Bioperl-guts-l] [Bug 2832] New: Add support for CPANPLUS to Bio::Root::Build Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2832 Summary: Add support for CPANPLUS to Bio::Root::Build Product: BioPerl Version: main-trunk Platform: All OS/Version: All Status: NEW Severity: minor Priority: P2 Component: Core Components AssignedTo: bioperl-guts-l at bioperl.org ReportedBy: cjfields at bioperl.org May solve issues related to bug 2831. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 20 11:40:39 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 20 May 2009 11:40:39 -0400 Subject: [Bioperl-guts-l] [Bug 2831] Build.PL appears to break automated testing with CPANPLUS In-Reply-To: Message-ID: <200905201540.n4KFedAH029120@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2831 cjfields at bioperl.org changed: What |Removed |Added ---------------------------------------------------------------------------- BugsThisDependsOn| |2832 ------- Comment #1 from cjfields at bioperl.org 2009-05-20 11:40 EST ------- May rely on bug 2832. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 20 11:40:39 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 20 May 2009 11:40:39 -0400 Subject: [Bioperl-guts-l] [Bug 2832] Add support for CPANPLUS to Bio::Root::Build In-Reply-To: Message-ID: <200905201540.n4KFedhJ029125@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2832 cjfields at bioperl.org changed: What |Removed |Added ---------------------------------------------------------------------------- OtherBugsDependingO| |2831 nThis| | -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 20 12:37:33 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 20 May 2009 12:37:33 -0400 Subject: [Bioperl-guts-l] [Bug 2834] New: Bio::Root::Build breaks Module::Build API Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2834 Summary: Bio::Root::Build breaks Module::Build API Product: BioPerl Version: main-trunk Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Core Components AssignedTo: bioperl-guts-l at bioperl.org ReportedBy: cjfields at bioperl.org Note that this isn't a serious bug, but it should be addressed by the 1.7 release as it's an API issue. When trying to DTRT with re: to subdistributions such as bioperl-run and their respective Build.PL, attempting to install a BioPerl subdistribution w/o core present does not work if attempting to fall back to vanilla Module::Build to preinstall bioperl core (a common bootstrapping procedure for some distributions). This appears due to parameters listed for Bio::Root::Build not respecting the Module::Build API. By that, I mean several methods used for Build parameters (such as requires/recommend, autofeatures, etc) are overridden to accept different arguments, so simply falling back to Module::Build doesn't work. Not sure how to rectify this at this point w/o significant changes to Bio::Root::Build. For a long-term solution we should work on creating bp-specific decorator methods (prefixed with bp_, maybe) that are caught during construction, then do the processing and delegate to the proper Module::Build methods as needed. The current 'workaround' is to simply bail if the properly versioned Bio::Root::Build isn't present. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From chmille4 at dev.open-bio.org Thu May 21 10:35:24 2009 From: chmille4 at dev.open-bio.org (Chase Miller) Date: Thu, 21 May 2009 10:35:24 -0400 Subject: [Bioperl-guts-l] [15699] bioperl-dev/trunk/Bio: Adding AlignIO, SeqIO, and TreeIO directories as a place to put future code for the nexml. pm modules Message-ID: <200905211435.n4LEZOQ8029185@dev.open-bio.org> Revision: 15699 Author: chmille4 Date: 2009-05-21 10:35:23 -0400 (Thu, 21 May 2009) Log Message: ----------- Adding AlignIO, SeqIO, and TreeIO directories as a place to put future code for the nexml.pm modules Added Paths: ----------- bioperl-dev/trunk/Bio/AlignIO/ bioperl-dev/trunk/Bio/SeqIO/ bioperl-dev/trunk/Bio/TreeIO/ From bugzilla-daemon at portal.open-bio.org Thu May 21 18:22:47 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 21 May 2009 18:22:47 -0400 Subject: [Bioperl-guts-l] [Bug 2399] BlastHSP::n gives empty values In-Reply-To: Message-ID: <200905212222.n4LMMlRj028251@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2399 maj at fortinbras.us changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #807|text/x-perl |text/plain mime type| | -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From maj at dev.open-bio.org Fri May 22 00:41:22 2009 From: maj at dev.open-bio.org (Mark Allen Jensen) Date: Fri, 22 May 2009 00:41:22 -0400 Subject: [Bioperl-guts-l] [15700] bioperl-dev/trunk/Bio/Root: clear out most bplive modules Message-ID: <200905220441.n4M4fMn7004340@dev.open-bio.org> Revision: 15700 Author: maj Date: 2009-05-22 00:41:22 -0400 (Fri, 22 May 2009) Log Message: ----------- clear out most bplive modules Removed Paths: ------------- bioperl-dev/trunk/Bio/Root/Exception.pm bioperl-dev/trunk/Bio/Root/HTTPget.pm bioperl-dev/trunk/Bio/Root/IO.pm bioperl-dev/trunk/Bio/Root/Root.pm bioperl-dev/trunk/Bio/Root/RootI.pm bioperl-dev/trunk/Bio/Root/Storable.pm bioperl-dev/trunk/Bio/Root/Utilities.pm Deleted: bioperl-dev/trunk/Bio/Root/Exception.pm =================================================================== --- bioperl-dev/trunk/Bio/Root/Exception.pm 2009-05-21 14:35:23 UTC (rev 15699) +++ bioperl-dev/trunk/Bio/Root/Exception.pm 2009-05-22 04:41:22 UTC (rev 15700) @@ -1,471 +0,0 @@ -#----------------------------------------------------------------- -# $Id: Exception.pm 15549 2009-02-21 00:48:48Z maj $ -# -# BioPerl module Bio::Root::Exception -# -# Please direct questions and support issues to -# -# Cared for by Steve Chervitz -# -# You may distribute this module under the same terms as perl itself -#----------------------------------------------------------------- - -=head1 NAME - -Bio::Root::Exception - Generic exception objects for Bioperl - -=head1 SYNOPSIS - -=head2 Throwing exceptions using L: - - use Bio::Root::Exception; - use Error; - - # Set Error::Debug to include stack trace data in the error messages - $Error::Debug = 1; - - $file = shift; - open (IN, $file) || - throw Bio::Root::FileOpenException ( "Can't open file $file for reading", $!); - -=head2 Throwing exceptions using L: - - # Here we have an object that ISA Bio::Root::Root, so it inherits throw(). - - open (IN, $file) || - $object->throw(-class => 'Bio::Root::FileOpenException', - -text => "Can't open file $file for reading", - -value => $!); - -=head2 Catching and handling exceptions using L: - - use Bio::Root::Exception; - use Error qw(:try); - - # Note that we need to import the 'try' tag from Error.pm - - # Set Error::Debug to include stack trace data in the error messages - $Error::Debug = 1; - - $file = shift; - try { - open (IN, $file) || - throw Bio::Root::FileOpenException ( "Can't open file $file for reading", $!); - } - catch Bio::Root::FileOpenException with { - my $err = shift; - print STDERR "Using default input file: $default_file\n"; - open (IN, $default_file) || die "Can't open $default_file"; - } - otherwise { - my $err = shift; - print STDERR "An unexpected exception occurred: \n$err"; - - # By placing an the error object reference within double quotes, - # you're invoking its stringify() method. - } - finally { - # Any code that you want to execute regardless of whether or not - # an exception occurred. - }; - # the ending semicolon is essential! - - -=head2 Defining a new Exception type as a subclass of Bio::Root::Exception: - - @Bio::TestException::ISA = qw( Bio::Root::Exception ); - - -=head1 DESCRIPTION - -=head2 Exceptions defined in L - -These are generic exceptions for typical problem situations that could arise -in any module or script. - -=over 8 - -=item Bio::Root::Exception() - -=item Bio::Root::NotImplemented() - -=item Bio::Root::IOException() - -=item Bio::Root::FileOpenException() - -=item Bio::Root::SystemException() - -=item Bio::Root::BadParameter() - -=item Bio::Root::OutOfRange() - -=item Bio::Root::NoSuchThing() - -=back - -Using defined exception classes like these is a good idea because it -indicates the basic nature of what went wrong in a convenient, -computable way. - -If there is a type of exception that you want to throw -that is not covered by the classes listed above, it is easy to define -a new one that fits your needs. Just write a line like the following -in your module or script where you want to use it (or put it somewhere -that is accessible to your code): - - @NoCanDoException::ISA = qw( Bio::Root::Exception ); - -All of the exceptions defined in this module inherit from a common -base class exception, Bio::Root::Exception. This allows a user to -write a handler for all Bioperl-derived exceptions as follows: - - use Bio::Whatever; - use Error qw(:try); - - try { - # some code that depends on Bioperl - } - catch Bio::Root::Exception with { - my $err = shift; - print "A Bioperl exception occurred:\n$err\n"; - }; - -So if you do create your own exceptions, just be sure they inherit -from Bio::Root::Exception directly, or indirectly by inheriting from a -Bio::Root::Exception subclass. - -The exceptions in Bio::Root::Exception are extensions of Graham Barr's -L module available from CPAN. Despite this dependency, the -L module does not explicitly C. -This permits Bio::Root::Exception to be loaded even when -Error.pm is not available. - -=head2 Throwing exceptions within Bioperl modules - -Error.pm is not part of the Bioperl distibution, and may not be -present within any given perl installation. So, when you want to -throw an exception in a Bioperl module, the safe way to throw it -is to use L which can use Error.pm -when it's available. See documentation in Bio::Root::Root for details. - -=head1 SEE ALSO - -See the C directory of the Bioperl distribution for -working demo code. - -L for information about throwing -L-based exceptions. - -L (available from CPAN, author: GBARR) - -Error.pm is helping to guide the design of exception handling in Perl 6. -See these RFC's: - - http://dev.perl.org/rfc/63.pod - - http://dev.perl.org/rfc/88.pod - - -=head1 AUTHOR - -Steve Chervitz Esac at bioperl.orgE - -=head1 COPYRIGHT - -Copyright (c) 2001 Steve Chervitz. All Rights Reserved. - -This library is free software; you can redistribute it and/or modify -it under the same terms as Perl itself. - -=head1 DISCLAIMER - -This software is provided "as is" without warranty of any kind. - -=head1 EXCEPTIONS - -=cut - -# Define some generic exceptions.' - -package Bio::Root::Exception; -use Bio::Root::Version; - -use strict; - -my $debug = $Error::Debug; # Prevents the "used only once" warning. -my $DEFAULT_VALUE = "__DUMMY__"; # Permits eval{} based handlers to work - -=head2 L - - Purpose : A generic base class for all BioPerl exceptions. - By including a "catch Bio::Root::Exception" block, you - should be able to trap all BioPerl exceptions. - Example : throw Bio::Root::Exception("A generic exception", $!); - -=cut - -#--------------------------------------------------------- - at Bio::Root::Exception::ISA = qw( Error ); -#--------------------------------------------------------- - -=head2 Methods defined by Bio::Root::Exception - -=over 4 - -=item L - - Purpose : Guarantees that -value is set properly before - calling Error::new(). - - Arguments: key-value style arguments same as for Error::new() - - You can also specify plain arguments as ($message, $value) - where $value is optional. - - -value, if defined, must be non-zero and not an empty string - in order for eval{}-based exception handlers to work. - These require that if($@) evaluates to true, which will not - be the case if the Error has no value (Error overloads - numeric operations to the Error::value() method). - - It is OK to create Bio::Root::Exception objects without - specifing -value. In this case, an invisible dummy value is used. - - If you happen to specify a -value of zero (0), it will - be replaced by the string "The number zero (0)". - - If you happen to specify a -value of empty string (""), it will - be replaced by the string "An empty string ("")". - -=cut - -sub new { - my ($class, @args) = @_; - my ($value, %params); - if( @args % 2 == 0 && $args[0] =~ /^-/) { - %params = @args; - $value = $params{'-value'}; - } - else { - $params{-text} = $args[0]; - $value = $args[1]; - } - - if( defined $value ) { - $value = "The number zero (0)" if $value =~ /^\d+$/ && $value == 0; - $value = "An empty string (\"\")" if $value eq ""; - } - else { - $value ||= $DEFAULT_VALUE; - } - $params{-value} = $value; - - my $self = $class->SUPER::new( %params ); - return $self; -} - -=item pretty_format() - - Purpose : Get a nicely formatted string containing information about the - exception. Format is similar to that produced by - Bio::Root::Root::throw(), with the addition of the name of - the exception class in the EXCEPTION line and some other - data available via the Error object. - Example : print $error->pretty_format; - -=cut - -sub pretty_format { - my $self = shift; - my $msg = $self->text; - my $stack = ''; - if( $Error::Debug ) { - $stack = $self->_reformat_stacktrace(); - } - my $value_string = $self->value ne $DEFAULT_VALUE ? "VALUE: ".$self->value."\n" : ""; - my $class = ref($self); - - my $title = "------------- EXCEPTION: $class -------------"; - my $footer = "\n" . '-' x CORE::length($title); - my $out = "\n$title\n" . - "MSG: $msg\n". $value_string. $stack. $footer . "\n"; - return $out; -} - - -# Reformatting of the stack performed by _reformat_stacktrace: -# 1. Shift the file:line data in line i to line i+1. -# 2. change xxx::__ANON__() to "try{} block" -# 3. skip the "require" and "Error::subs::try" stack entries (boring) -# This means that the first line in the stack won't have any file:line data -# But this isn't a big issue since it's for a Bio::Root::-based method -# that doesn't vary from exception to exception. - -sub _reformat_stacktrace { - my $self = shift; - my $msg = $self->text; - my $stack = $self->stacktrace(); - $stack =~ s/\Q$msg//; - my @stack = split( /\n/, $stack); - my @new_stack = (); - my ($method, $file, $linenum, $prev_file, $prev_linenum); - my $stack_count = 0; - foreach my $i( 0..$#stack ) { - # print "STACK-ORIG: $stack[$i]\n"; - if( ($stack[$i] =~ /^\s*([^(]+)\s*\(.*\) called at (\S+) line (\d+)/) || - ($stack[$i] =~ /^\s*(require 0) called at (\S+) line (\d+)/)) { - ($method, $file, $linenum) = ($1, $2, $3); - $stack_count++; - } @@ Diff output truncated at 10000 characters. @@ From bugzilla-daemon at portal.open-bio.org Fri May 22 06:29:53 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 22 May 2009 06:29:53 -0400 Subject: [Bioperl-guts-l] [Bug 2835] New: "Bio::DB::SeqFeature::Store->new( ... -namespace => 'x' )->attributes" borked Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2835 Summary: "Bio::DB::SeqFeature::Store->new( ... -namespace => 'x' )->attributes" borked Product: BioPerl Version: main-trunk Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Bio::DB::GFF AssignedTo: bioperl-guts-l at bioperl.org ReportedBy: dan.bolser at gmail.com svn update At revision 15700. I set a -namespace in my call to "Bio::DB::SeqFeature::Store->new", and later call the "attributes" method of the returned object. I see the fatal error message: Can't use an undefined value as an ARRAY reference at /local/Scratch/dbolser/perl5/lib/perl5/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 464. Here is the patch to fix the bug: diff -u \ ~/perl5/lib/perl5/Bio/DB/SeqFeature/Store/DBI/mysql.pm~ \ ~/perl5/lib/perl5/Bio/DB/SeqFeature/Store/DBI/mysql.pm --- ~/perl5/lib/perl5/Bio/DB/SeqFeature/Store/DBI/mysql.pm~ 2009-05-11 15:25:55.000000000 +0100 +++ ~/perl5/lib/perl5/Bio/DB/SeqFeature/Store/DBI/mysql.pm 2009-05-22 11:26:26.000000000 +0100 @@ -460,7 +460,10 @@ sub attributes { my $self = shift; my $dbh = $self->dbh; - my $a = $dbh->selectcol_arrayref('SELECT tag FROM attributelist'); + my $attributelist_table = $self->_attributelist_table; + + my $a = $dbh->selectcol_arrayref("SELECT tag FROM $attributelist_table") + or $self->throw($dbh->errstr); return @$a; } Also, please clarify how you would like to receive diffs here: http://www.bioperl.org/wiki/Bugs#Submitting_Patches -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From maj at dev.open-bio.org Fri May 22 07:47:08 2009 From: maj at dev.open-bio.org (Mark Allen Jensen) Date: Fri, 22 May 2009 07:47:08 -0400 Subject: [Bioperl-guts-l] [15701] bioperl-live/trunk/Bio/DB/SeqFeature/Store/DBI/mysql.pm: patch for bug #2835, courtesy of Dan Bolser Message-ID: <200905221147.n4MBl8hM005528@dev.open-bio.org> Revision: 15701 Author: maj Date: 2009-05-22 07:47:06 -0400 (Fri, 22 May 2009) Log Message: ----------- patch for bug #2835, courtesy of Dan Bolser thanks! Modified Paths: -------------- bioperl-live/trunk/Bio/DB/SeqFeature/Store/DBI/mysql.pm Modified: bioperl-live/trunk/Bio/DB/SeqFeature/Store/DBI/mysql.pm =================================================================== --- bioperl-live/trunk/Bio/DB/SeqFeature/Store/DBI/mysql.pm 2009-05-22 04:41:22 UTC (rev 15700) +++ bioperl-live/trunk/Bio/DB/SeqFeature/Store/DBI/mysql.pm 2009-05-22 11:47:06 UTC (rev 15701) @@ -460,7 +460,10 @@ sub attributes { my $self = shift; my $dbh = $self->dbh; - my $a = $dbh->selectcol_arrayref('SELECT tag FROM attributelist'); + my $attributelist_table = $self->_attributelist_table; + + my $a = $dbh->selectcol_arrayref("SELECT tag FROM$attributelist_table") + or $self->throw($dbh->errstr); return @$a; } From bugzilla-daemon at portal.open-bio.org Fri May 22 07:47:45 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 22 May 2009 07:47:45 -0400 Subject: [Bioperl-guts-l] [Bug 2835] "Bio::DB::SeqFeature::Store->new( ... -namespace => 'x' )->attributes" borked In-Reply-To: Message-ID: <200905221147.n4MBljWK031081@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2835 maj at fortinbras.us changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from maj at fortinbras.us 2009-05-22 07:47 EST ------- Thanks for the patch, Dan- MAJ -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From maj at dev.open-bio.org Fri May 22 09:42:36 2009 From: maj at dev.open-bio.org (Mark Allen Jensen) Date: Fri, 22 May 2009 09:42:36 -0400 Subject: [Bioperl-guts-l] [15702] bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm: filter-per-algorithm functionality (by strand, by frame) Message-ID: <200905221342.n4MDgapw006196@dev.open-bio.org> Revision: 15702 Author: maj Date: 2009-05-22 09:42:35 -0400 (Fri, 22 May 2009) Log Message: ----------- filter-per-algorithm functionality (by strand, by frame) Modified Paths: -------------- bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm Modified: bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm =================================================================== --- bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm 2009-05-22 11:47:06 UTC (rev 15701) +++ bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm 2009-05-22 13:42:35 UTC (rev 15702) @@ -104,8 +104,34 @@ use Bio::Search::Tiling::TilingI; use Bio::Search::Tiling::MapTileUtils; +# use base qw(Bio::Root::Root Bio::Search::Tiling::TilingI); use base qw(Bio::Root::Root Bio::Search::Tiling::TilingI); +# fast, clear, nasty, brutish and short. +# for _allowable_filters() +# covers BLAST, FAST families +# FASTA is ambiguous (nt or aa) based on alg name only + +my $filter_lookup = { + 'N' => { 'q' => qr/[s]/, + 'h' => qr/[s]/ }, + 'P' => { 'q' => '', + 'h' => '' }, + 'X' => { 'q' => qr/[sf]/, + 'h' => '' }, + 'Y' => { 'q' => qr/[sf]/, + 'h' => '' }, + 'TA' => { 'q' => '', + 'h' => qr/[sf]/ }, + 'TN' => { 'q' => '', + 'h' => qr/[sf]/ }, + 'TX' => { 'q' => qr/[sf]/, + 'h' => qr/[sf]/ }, + 'TY' => { 'q' => qr/[sf]/, + 'h' => qr/[sf]/ } +}; + + =head2 CONSTRUCTOR =head2 new @@ -126,52 +152,27 @@ my $class = shift; my @args = @_; my $self = $class->SUPER::new; - my($hit, $qstrand, $hstrand) = $self->_rearrange( [qw( HIT QSTRAND HSTRAND )], at args ); + my($hit, $qstrand, $hstrand, $qframe, $hframe) = $self->_rearrange( [qw( HIT QSTRAND HSTRAND QFRAME HFRAME )], at args ); $self->throw("HitI object required") unless $hit; $self->throw("Argument must be HitI object") unless ( ref $hit && $hit->isa('Bio::Search::Hit::HitI') ); + $self->{hit} = $hit; my @hsps; - # filter if requested and allowed - if ($qstrand) { - # check value - if ( abs($qstrand) != 1 ) { - $self->throw("Bad argument: QSTRAND must be either +1 or -1"); - } - # check algorithm - if (!_involves_dna($hit,'query')) { - $self->warn("Query does not involve a dna sequence; QSTRAND ignored"); - $qstrand = undef; - } + $self->_check_args($qstrand, $hstrand, $qframe, $hframe); + # filter if requested + while (local $_ = $hit->next_hsp) { + push @hsps, $_ if ( ( !$qstrand || ($qstrand == $_->strand('query'))) && + ( !$hstrand || ($hstrand == $_->strand('hit')) ) && + ( !defined $qframe || ($qframe == $_->frame('query')) ) && + ( !defined $hframe || ($hframe == $_->frame('hit')) ) ); } - if ($hstrand) { - # check value - if ( abs($hstrand) != 1 ) { - $self->throw("Bad argument: HSTRAND must be either +1 or -1"); - } - # check algorithm - if (!_involves_dna($hit,'hit')) { - $self->warn("Subject does not involve a dna sequence; HSTRAND ignored"); - $hstrand = undef; - } - } - if ( !($qstrand || $hstrand) ) { - while (local $_ = $hit->next_hsp) { push at hsps, $_; } - } - else { # filter - while (local $_ = $hit->next_hsp) { - push @hsps, $_ if ( ( !$qstrand || ($qstrand == $_->strand('query'))) && - ( !$hstrand || ($hstrand == $_->strand('hit') )) ); - } - } - - - $self->warn("No HSPs present in hit") unless (@hsps); + $self->warn("No HSPs present in hit after filtering") unless (@hsps); $self->hsps(\@hsps); - $self->{hit} = $hit; $self->{"strand_query"} = $qstrand; - $self->{"strand_hit"} = $hstrand; - + $self->{"strand_hit"} = $hstrand; + $self->{"frame_query"} = $qframe; + $self->{"strand_hit"} = $hframe; return $self; } @@ -217,7 +218,7 @@ sub rewind_tilings{ my $self = shift; - my $typen = shift; + my $type = shift; $type ||= 'query'; $self->throw("Unknown type '$type'") unless grep(/^$type$/, qw( hit query subject )); $type = 'hit' if $type eq 'subject'; @@ -661,52 +662,79 @@ return $self->{"_tiling_iterator_$type"}; } -=head2 _involves_dna +=head2 _allowable_filters - Title : _involves_dna - Usage : _involves_dna($Bio_Search_Hit_HitI, $type) - Function: Test if sequences of $type are/were dna sequences, + Title : _allowable_filters + Usage : _allowable_filters($Bio_Search_Hit_HitI, $type) + Function: Return the HSP filters (strand, frame) allowed, based on the reported algorithm - Returns : True if hit involves dna sequence + Returns : String encoding allowable filters: + s = strand, f = frame + Empty string if no filters allowed + undef if algorithm unrecognized Args : A Bio::Search::Hit::HitI object, scalar $type, one of 'hit', 'subject', 'query'; default is 'query' =cut -sub _involves_dna { +sub _allowable_filters { my $hit = shift; my $type = shift; - $type ||= 'query'; - unless (grep /^$type$/, qw( hit query subject ) ) { - warn("Unknown type '$type'; returning false"); - return 0; + $type ||= 'q'; + unless (grep /^$type$/, qw( h q s ) ) { + warn("Unknown type '$type'; returning ''"); + return ''; } - $type = 'hit' if $type eq 'subject'; + $type = 'h' if $type eq 's'; my $alg = $hit->algorithm; for ($alg) { + /MEGABLAST/i && do { + return qr/[s]/; + }; /(.?)BLAST(.?)/i && do { - return 1 if ( ($2 =~ /N/i) ); - return 1 if (($2 =~ /X/i) || ($1 =~ /T/i) && ($type eq 'query')); - last; + return $$filter_lookup{$1.$2}{$type}; }; /(.?)FAST(.?)/ && do { - return 1 if (( $2 =~ /A/i ) && ($type eq 'subject')); - return 1 if (($1 =~ /T/i) && ($2 =~ /X/i) && ($type eq 'subject')); - return 1 if (( $1 !~ /T/i ) && ($2 =~ /A/i) && - ($type eq 'query')); - return 1 if (($2 =~ /X/i) && ($type eq 'query')); - last; + return $$filter_lookup{$1.$2}{$type}; }; do { # unrecognized last; }; } - return 0; + return; } +=head2 _check_args + Title : _check_args + Usage : _check_args($qstrand, $hstrand, $qframe, $hframe) + Function: Throw if strand/frame parms out of bounds or set + uselessly for the underlying algorithm + Returns : True on success + Args : requested filter arguments to constructor +=cut +no strict qw( refs ); +sub _check_args { + my ($self, $qstrand, $hstrand, $qframe, $hframe) = @_; + $self->throw("Strand filter arguments must be +1 or -1") + if ( $qstrand && !(abs($qstrand)==1) or + $hstrand && !(abs($hstrand)==1) ); + $self->throw("Frame filter arguments must be one of (-2,-1,0,1,2)") + if ( $qframe && !(grep {abs($qframe)} (0, 1, 2)) or + $hframe && !(grep {abs($hframe)} (0, 1, 2)) ); + + for my $t qw( q h ) { + for my $f qw( strand frame ) { + my $allowed = _allowable_filters($self->hit, $t); + $self->throw("Filter '$t$f' is not useful for ".$self->hit->algorithm." results") + if ( eval "\$${t}${f}" && !($allowed && $f =~ /^$allowed/) ); + } + } + return 1; +} + 1; From maj at dev.open-bio.org Fri May 22 09:43:21 2009 From: maj at dev.open-bio.org (Mark Allen Jensen) Date: Fri, 22 May 2009 09:43:21 -0400 Subject: [Bioperl-guts-l] [15703] bioperl-dev/trunk/t/SearchIO/Tiling.t: arg testing filter-per-algorithm functionality Message-ID: <200905221343.n4MDhLvT006229@dev.open-bio.org> Revision: 15703 Author: maj Date: 2009-05-22 09:43:21 -0400 (Fri, 22 May 2009) Log Message: ----------- arg testing filter-per-algorithm functionality Modified Paths: -------------- bioperl-dev/trunk/t/SearchIO/Tiling.t Modified: bioperl-dev/trunk/t/SearchIO/Tiling.t =================================================================== --- bioperl-dev/trunk/t/SearchIO/Tiling.t 2009-05-22 13:42:35 UTC (rev 15702) +++ bioperl-dev/trunk/t/SearchIO/Tiling.t 2009-05-22 13:43:21 UTC (rev 15703) @@ -4,7 +4,7 @@ BEGIN { use lib '.'; use Bio::Root::Test; - test_begin(-tests => 19); + test_begin(-tests => 1000 ); } use_ok('Bio::Search::Tiling::MapTiling'); @@ -25,6 +25,7 @@ ok(my $test_hit = $_, 'got test hit'); ok(my $tiling = Bio::Search::Tiling::MapTiling->new($test_hit), 'create tiling'); + # TilingI compliance isa_ok($tiling, 'Bio::Search::Tiling::TilingI'); @@ -68,5 +69,52 @@ while ($tiling->next_tiling('subject')) {$sn++}; is ($sn, 256, 'tiling iterator regression test(3, rewind)'); -# more to come +# test the filters and filter checking +# arrays are of the form +# [$format, $file, \@living_filters, \@dying_filters] +# @filters = ($qstrand, $hstrand, $qframe, $hframe) + +my %examples = ( + 'BLASTN' => ['blast', 'AE003528_ecoli.bls', + [1,-1, undef, undef], + [1,-1, 1, 1]], + 'BLASTP' => ['blast', 'catalase-webblast.BLASTP', + [undef, undef, undef, undef], + [1, undef, undef, undef]], + 'BLASTX' => ['blast', 'dnaEbsub_ecoli.wublastx', + [1, undef, undef, undef], + [undef, 1, undef, 1]], + 'TBLASTN'=> ['blast', 'dnaEbsub_ecoli.wutblastn', + [undef, 1, undef, 1], + [1, undef, 1, undef]], + 'TBLASTX'=> ['blast', 'dnaEbsub_ecoli.wutblastx', + [1, 1, 0, 1], + [1, -2, 3, 3]], + 'FASTA' => ['fasta', 'cysprot_vs_gadfly.FASTA', + [undef, undef, undef, undef], + [1, undef, undef, undef]], + 'FASTXY' => ['fasta', '5X_1895.FASTXY', + [1, undef, undef, undef], + [undef, 1, undef, 1]], + 'MEGABLAST' => ['blast', '503384.MEGABLAST.2', + [1,-1, undef, undef], + [1,-1, 1, 1]], + 'TFASTA' => undef, + 'TFASTX' => undef + ); + +foreach (keys %examples) { + next unless $examples{$_}; + ok( my $blio = Bio::SearchIO->new( -format=>$examples{$_}[0], + -file =>test_input_file($examples{$_}[1])), + "$_ data file"); + my $hit = $blio->next_result->next_hit; + ok( $tiling = Bio::Search::Tiling::MapTiling->new($hit, @{$examples{$_}[2]}), "tiling object created for $_ hit"); + dies_ok { Bio::Search::Tiling::MapTiling->new($hit, @{$examples{$_}[3]}) } "tiling object arg exception check for $_ hit"; + 1; +} + + + + From bugzilla-daemon at portal.open-bio.org Fri May 22 11:21:13 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 22 May 2009 11:21:13 -0400 Subject: [Bioperl-guts-l] [Bug 2836] New: The bp_seqfeature_load.pl script dosn't support the '-namespace' option to new Bio::DB::SeqFeature::Store Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2836 Summary: The bp_seqfeature_load.pl script dosn't support the '- namespace' option to new Bio::DB::SeqFeature::Store Product: BioPerl Version: main-trunk Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Bio::DB::GFF AssignedTo: bioperl-guts-l at bioperl.org ReportedBy: dan.bolser at gmail.com To help keep organised I use the -namespace option of Bio::DB::SeqFeature::Store->new : http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/DB/SeqFeature/Store.pm#new However, this option is not settable from the command line when using the BioPerl script bp_seqfeature_load.pl -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 22 11:23:34 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 22 May 2009 11:23:34 -0400 Subject: [Bioperl-guts-l] [Bug 2836] The bp_seqfeature_load.pl script dosn't support the '-namespace' option to new Bio::DB::SeqFeature::Store In-Reply-To: Message-ID: <200905221523.n4MFNYWU019392@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2836 ------- Comment #1 from dan.bolser at gmail.com 2009-05-22 11:23 EST ------- Created an attachment (id=1302) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1302&action=view) A minor change to the GetOptions to include a namespace option (with a null default for backwards compatibility). I took the opportunity to tidy up and comment the script a bit. The 'Usage' was already going into technical details, which can be very confusing for a beginner let me tell you! Hope I didn't hack too far. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 22 11:24:01 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 22 May 2009 11:24:01 -0400 Subject: [Bioperl-guts-l] [Bug 2836] The bp_seqfeature_load.pl script dosn't support the '-namespace' option to new Bio::DB::SeqFeature::Store In-Reply-To: Message-ID: <200905221524.n4MFO1GE019479@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2836 dan.bolser at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1302|application/octet-stream |text/plain mime type| | Attachment #1302 is|0 |1 patch| | -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From maj at dev.open-bio.org Fri May 22 12:26:41 2009 From: maj at dev.open-bio.org (Mark Allen Jensen) Date: Fri, 22 May 2009 12:26:41 -0400 Subject: [Bioperl-guts-l] [15704] bioperl-dev/trunk/t/SearchIO/Tiling.t: keyword subst Message-ID: <200905221626.n4MGQfLV006633@dev.open-bio.org> Revision: 15704 Author: maj Date: 2009-05-22 12:26:41 -0400 (Fri, 22 May 2009) Log Message: ----------- keyword subst Modified Paths: -------------- bioperl-dev/trunk/t/SearchIO/Tiling.t Property Changed: ---------------- bioperl-dev/trunk/t/SearchIO/Tiling.t Modified: bioperl-dev/trunk/t/SearchIO/Tiling.t =================================================================== --- bioperl-dev/trunk/t/SearchIO/Tiling.t 2009-05-22 13:43:21 UTC (rev 15703) +++ bioperl-dev/trunk/t/SearchIO/Tiling.t 2009-05-22 16:26:41 UTC (rev 15704) @@ -1,5 +1,5 @@ #-*-perl-*- -#$Id: t.pl 420 2009-05-18 04:11:18Z maj $ +#$Id$ use strict; BEGIN { use lib '.'; Property changes on: bioperl-dev/trunk/t/SearchIO/Tiling.t ___________________________________________________________________ Name: svn:keywords + Id Date Author Rev From maj at dev.open-bio.org Fri May 22 12:27:06 2009 From: maj at dev.open-bio.org (Mark Allen Jensen) Date: Fri, 22 May 2009 12:27:06 -0400 Subject: [Bioperl-guts-l] [15705] bioperl-dev/trunk/Bio/Search/Tiling: keyword subst Message-ID: <200905221627.n4MGR6F6006664@dev.open-bio.org> Revision: 15705 Author: maj Date: 2009-05-22 12:27:06 -0400 (Fri, 22 May 2009) Log Message: ----------- keyword subst Modified Paths: -------------- bioperl-dev/trunk/Bio/Search/Tiling/MapTileUtils.pm bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm bioperl-dev/trunk/Bio/Search/Tiling/TilingI.pm Property Changed: ---------------- bioperl-dev/trunk/Bio/Search/Tiling/MapTileUtils.pm bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm bioperl-dev/trunk/Bio/Search/Tiling/TilingI.pm Modified: bioperl-dev/trunk/Bio/Search/Tiling/MapTileUtils.pm =================================================================== --- bioperl-dev/trunk/Bio/Search/Tiling/MapTileUtils.pm 2009-05-22 16:26:41 UTC (rev 15704) +++ bioperl-dev/trunk/Bio/Search/Tiling/MapTileUtils.pm 2009-05-22 16:27:06 UTC (rev 15705) @@ -1,4 +1,4 @@ -#$Id: MapTileUtils.pm 433 2009-05-19 19:19:38Z maj $ +#$Id$ package Bio::Search::Tiling::MapTileUtils; use strict; use warnings; Property changes on: bioperl-dev/trunk/Bio/Search/Tiling/MapTileUtils.pm ___________________________________________________________________ Name: svn:keywords + Id Date Author Rev Modified: bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm =================================================================== --- bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm 2009-05-22 16:26:41 UTC (rev 15704) +++ bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm 2009-05-22 16:27:06 UTC (rev 15705) @@ -1,4 +1,4 @@ -# $Id: MapTiling.pm 433 2009-05-19 19:19:38Z maj $ +# $Id$ # # BioPerl module for Bio::Search::Tiling::MapTiling # Property changes on: bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm ___________________________________________________________________ Name: svn:keywords + Id Date Author Rev Modified: bioperl-dev/trunk/Bio/Search/Tiling/TilingI.pm =================================================================== --- bioperl-dev/trunk/Bio/Search/Tiling/TilingI.pm 2009-05-22 16:26:41 UTC (rev 15704) +++ bioperl-dev/trunk/Bio/Search/Tiling/TilingI.pm 2009-05-22 16:27:06 UTC (rev 15705) @@ -1,4 +1,4 @@ -# $Id: TilingI.pm 432 2009-05-19 19:16:03Z maj $ +# $Id$ # # BioPerl module for Bio::Search::Tiling::TilingI # Property changes on: bioperl-dev/trunk/Bio/Search/Tiling/TilingI.pm ___________________________________________________________________ Name: svn:keywords + Id Date Author Rev From maj at dev.open-bio.org Fri May 22 18:00:24 2009 From: maj at dev.open-bio.org (Mark Allen Jensen) Date: Fri, 22 May 2009 18:00:24 -0400 Subject: [Bioperl-guts-l] [15706] bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm: coverage_map_as_text(): print a 'graphic' Message-ID: <200905222200.n4MM0O17007336@dev.open-bio.org> Revision: 15706 Author: maj Date: 2009-05-22 18:00:24 -0400 (Fri, 22 May 2009) Log Message: ----------- coverage_map_as_text(): print a 'graphic' representation of the coverage map; print it and check my work (looks like a contig!) Modified Paths: -------------- bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm Modified: bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm =================================================================== --- bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm 2009-05-22 16:27:06 UTC (rev 15705) +++ bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm 2009-05-22 22:00:24 UTC (rev 15706) @@ -99,6 +99,7 @@ use warnings; # Object preamble - inherits from Bio::Root::Root +use lib '../../..'; use Bio::Root::Root; use Bio::Search::Tiling::TilingI; @@ -159,7 +160,7 @@ $self->{hit} = $hit; my @hsps; - $self->_check_args($qstrand, $hstrand, $qframe, $hframe); + $self->_check_new_args($qstrand, $hstrand, $qframe, $hframe); # filter if requested while (local $_ = $hit->next_hsp) { push @hsps, $_ if ( ( !$qstrand || ($qstrand == $_->strand('query'))) && @@ -197,10 +198,7 @@ sub next_tiling{ my $self = shift; my $type = shift; - $type ||= 'query'; - $self->throw("Unknown type '$type'") unless grep(/^$type$/, qw( hit query subject )); - $type = 'hit' if $type eq 'subject'; - + $self->_check_type_arg(\$type); return $self->_tiling_iterator($type)->(); } @@ -219,10 +217,7 @@ sub rewind_tilings{ my $self = shift; my $type = shift; - $type ||= 'query'; - $self->throw("Unknown type '$type'") unless grep(/^$type$/, qw( hit query subject )); - $type = 'hit' if $type eq 'subject'; - + $self->_check_type_arg(\$type); return $self->_tiling_iterator($type)->('REWIND'); } @@ -245,10 +240,8 @@ sub identities{ my $self = shift; my ($type, $action) = @_; - $type ||= 'query'; + $self->_check_type_arg(\$type); $action ||= 'exact'; - $self->throw("Unknown type '$type'") unless grep(/^$type$/, qw( hit query subject )); - $type = 'hit' if $type eq 'subject'; $self->throw("Unknown action '$action'") unless grep(/^$action$/, qw( exact est max )); if (!defined $self->{"identities_${type}_${action}"}) { $self->_calc_stats($type, $action); @@ -273,10 +266,8 @@ sub conserved{ my $self = shift; my ($type, $action) = @_; - $type ||= 'query'; + $self->_check_type_arg(\$type); $action ||= 'exact'; - $self->throw("Unknown type '$type'") unless grep(/^$type$/, qw( hit query subject )); - $type = 'hit' if $type eq 'subject'; $self->throw("Unknown action '$action'") unless grep(/^$action$/, qw( exact est max )); if (!defined $self->{"conserved_${type}_${action}"}) { $self->_calc_stats($type, $action); @@ -302,10 +293,9 @@ sub length{ my $self = shift; my ($type,$action) = @_; - $type ||= 'query'; + $self->_check_type_arg(\$type); + $action ||= 'exact'; - $self->throw("Unknown type '$type'") unless grep(/^$type$/, qw( hit query subject )); - $type = 'hit' if $type eq 'subject'; $self->throw("Unknown action '$action'") unless grep(/^$action$/, qw( exact est max )); if (!defined $self->{"length_${type}_${action}"}) { $self->_calc_stats($type, $action); @@ -348,15 +338,68 @@ sub coverage_map{ my $self = shift; my $type = shift; - $type ||= 'query'; - $self->throw("Unknown type '$type'") unless grep(/^$type$/, qw( hit query subject )); - $type = 'hit' if $type eq 'subject'; + $self->_check_type_arg(\$type); if (!defined $self->{"coverage_map_$type"}) { $self->_calc_coverage_map($type); } return @{$self->{"coverage_map_$type"}}; } +=head2 coverage_map_as_text + + Title : coverage_map_as_text + Usage : $tiling->coverage_map_as_text($type, $legend_flag) + Function: Format a text-graphic representation of the + coverage map + Returns : an array of scalar strings, suitable for printing + Args : $type: one of 'query', 'hit', 'subject' + $legend_flag: boolean; print a legend indicating + the actual interval coordinates for each component + interval and hsp (in the $type sequence context) + Example : print $tiling->coverage_map_as_text('query',1); + +=cut + +sub coverage_map_as_text{ + my $self = shift; + my $type = shift; + my $legend_q = shift; + $self->_check_type_arg(\$type); + my @map = $self->coverage_map($type); + my @ret; + my @hsps = $self->hit->hsps; + my %hsps_i; + require Tie::RefHash; + tie %hsps_i, 'Tie::RefHash'; + @hsps_i{@hsps} = (0..$#hsps); + my @mx; + foreach (0..$#map) { + my @hspx = ('') x @hsps; + my @these_hsps = @{$map[$_]->[1]}; + @hspx[@hsps_i{@these_hsps}] = ('*') x @these_hsps; + $mx[$_] = \@hspx; + } + untie %hsps_i; + + push @ret, "\tIntvl\n"; + push @ret, "HSPS\t", join ("\t", (0..$#map)), "\n"; + foreach my $h (0..$#hsps) { + push @ret, join("\t", $h, map { $mx[$_][$h] } (0..$#map) ),"\n"; + } + if ($legend_q) { + push @ret, "Interval legend\n"; + foreach (0..$#map) { + push @ret, sprintf("%d\t[%d, %d]\n", $_, @{$map[$_][0]}); + } + push @ret, "HSP legend\n"; + my @ints = get_intervals_from_hsps($type, at hsps); + foreach (0..$#hsps) { + push @ret, sprintf("%d\t[%d, %d]\n", $_, @{$ints[$_]}); + } + } + return @ret; +} + =head2 hsps Title : hsps @@ -389,13 +432,31 @@ sub strand{ my $self = shift; my $type = shift; - $type ||= 'query'; - $self->throw("Unknown type '$type'") unless grep(/^$type$/, qw( hit query subject )); - $type = 'hit' if $type eq 'subject'; + $self->_check_type_arg(\$type); $self->warn("Getter only") if @_; return $self->{"strand_$type"}; } +=head2 frame + + Title : frame + Usage : $tiling->frame + Function: Retrieve the frame value filtering the invocant's hit + Example : + Returns : value of strand (-2, -1, 0, +1, +2) + Args : + Note : getter only + +=cut + +sub frame{ + my $self = shift; + my $type = shift; + $self->_check_type_arg(\$type); + $self->warn("Getter only") if @_; + return $self->{"frame_$type"}; +} + =head2 "PRIVATE" METHODS =head2 _calc_coverage_map @@ -423,9 +484,7 @@ sub _calc_coverage_map { my $self = shift; my ($type) = @_; - $type ||= 'query'; - $self->throw("Unknown type '$type'") unless grep( /^$type$/, qw( hit subject query )); - $type = 'hit' if $type eq 'subject'; + $self->_check_type_arg(\$type); # obtain the [start, end] intervals for all hsps in the hit (relative # to the type) @@ -497,10 +556,10 @@ sub _calc_stats { my $self = shift; my ($type, $action) = @_; - $type ||= 'query'; + $self->_check_type_arg(\$type); + $action ||= 'exact'; - $self->throw("Unknown type '$type'") unless grep( /^$type$/, qw( hit subject query )); - $type = 'hit' if $type eq 'subject'; + $self->throw("Unknown action '$action'") unless grep(/^$action$/, qw( exact est max )); $self->_calc_coverage_map($type) unless $self->coverage_map($type); @@ -586,9 +645,7 @@ ### create the urns my $self = shift; my $type = shift; - $type ||= 'query'; - $self->throw("Unrecognized type '$type'") unless - ( grep /^$type$/, qw( hit subject query ) ); + $self->_check_type_arg(\$type); # initialize the urns my @urns = map { [0, $$_[1]] } $self->coverage_map($type); @@ -653,9 +710,8 @@ sub _tiling_iterator { my $self = shift; my $type = shift; - $type ||= 'query'; - $self->throw("Unknown type '$type'") unless grep(/^$type$/, qw( hit query subject )); - $type = 'hit' if $type eq 'subject'; + $self->_check_type_arg(\$type); + if (!defined $self->{"_tiling_iterator_$type"}) { $self->_make_tiling_iterator($type); } @@ -706,18 +762,18 @@ return; } -=head2 _check_args +=head2 _check_new_args - Title : _check_args - Usage : _check_args($qstrand, $hstrand, $qframe, $hframe) + Title : _check_new_args + Usage : _check_new_args($qstrand, $hstrand, $qframe, $hframe) Function: Throw if strand/frame parms out of bounds or set uselessly for the underlying algorithm Returns : True on success Args : requested filter arguments to constructor =cut -no strict qw( refs ); -sub _check_args { + +sub _check_new_args { my ($self, $qstrand, $hstrand, $qframe, $hframe) = @_; $self->throw("Strand filter arguments must be +1 or -1") if ( $qstrand && !(abs($qstrand)==1) or @@ -736,5 +792,15 @@ return 1; } + +sub _check_type_arg { + my $self = shift; + my $typeref = shift; + $$typeref ||= 'query'; + $self->throw("Unknown type '$$typeref'") unless grep(/^$$typeref$/, qw( hit query subject )); + $$typeref = 'hit' if $$typeref eq 'subject'; + return 1; +} + 1; From fangly at dev.open-bio.org Fri May 22 21:02:38 2009 From: fangly at dev.open-bio.org (Florent E Angly) Date: Fri, 22 May 2009 21:02:38 -0400 Subject: [Bioperl-guts-l] [15707] bioperl-live/trunk/Bio/Assembly/Tools/ContigSpectrum.pm: Added function to score a contig sprectrum Message-ID: <200905230102.n4N12cYO007758@dev.open-bio.org> Revision: 15707 Author: fangly Date: 2009-05-22 21:02:38 -0400 (Fri, 22 May 2009) Log Message: ----------- Added function to score a contig sprectrum Modified Paths: -------------- bioperl-live/trunk/Bio/Assembly/Tools/ContigSpectrum.pm Modified: bioperl-live/trunk/Bio/Assembly/Tools/ContigSpectrum.pm =================================================================== --- bioperl-live/trunk/Bio/Assembly/Tools/ContigSpectrum.pm 2009-05-22 22:00:24 UTC (rev 15706) +++ bioperl-live/trunk/Bio/Assembly/Tools/ContigSpectrum.pm 2009-05-23 01:02:38 UTC (rev 15707) @@ -74,6 +74,9 @@ -cross => $mixed_csp ); print "The cross contig spectrum is ".$cross_csp->to_string."\n"; + # Score a contig spectrum (the more abundant the contigs and the larger their + # size, the larger the score) + =head1 DESCRIPTION @@ -781,6 +784,55 @@ } +=head2 average + + Title : score + Usage : my $score = $csp->score(); + Function: Score a contig spectrum (or cross-contig spectrum) such that the + higher the number of contigs (or cross-contigs) and the larger their + size, the higher the score. + Let n : total number of sequences + c_q : number of contigs of size q + q : number of sequence in a contig + We define: score = n/(n-1) * (X - 1/n) + where X = sum ( c_q * q^2 ) / n**2 + The score ranges from 0 (singlets only) to 1 (a single large contig) + It is possible to specify a value for the number of sequences to + assume in the contig spectrum. + Returns : contig score + Args : number of total sequences to assume [optional] + +=cut + +sub score { + my ($self, $nof_seqs) = @_; + # Main + my $score = 0; + my $n = $self->nof_seq; + if ( $n > 0 ) { + # Contig spectrum info + my $q_max = $self->max_size; + my $spec = $self->spectrum; + # Adjust number of 1-contigs + if ( $nof_seqs ) { + $spec->{'1'} += $nof_seqs - $n; + $n = $nof_seqs; + } + # Calculate X + for my $q ( 1 .. $q_max ) { + if ( $spec->{$q} ) { + my $c_q = $spec->{$q}; + $score += $c_q * $q ** 2; + } + } + $score /= $n ** 2; + } + # Rescale X to obtain the score + $score = $n/($n-1) * ($score - 1/$n); + return $score; +} + + =head2 _naive_assembler Title : _naive_assembler From maj at dev.open-bio.org Fri May 22 23:51:36 2009 From: maj at dev.open-bio.org (Mark Allen Jensen) Date: Fri, 22 May 2009 23:51:36 -0400 Subject: [Bioperl-guts-l] [15708] bioperl-dev/trunk/Bio/Search/Tiling/TilingI.pm: expanded prescribed methods to include most of those Message-ID: <200905230351.n4N3paD9008034@dev.open-bio.org> Revision: 15708 Author: maj Date: 2009-05-22 23:51:36 -0400 (Fri, 22 May 2009) Log Message: ----------- expanded prescribed methods to include most of those analogous to the HSP::HSPI statistics methods also expanded the intro POD to put a bunch of algorithm-specific report information all in one place, as an aid to devs Modified Paths: -------------- bioperl-dev/trunk/Bio/Search/Tiling/TilingI.pm Modified: bioperl-dev/trunk/Bio/Search/Tiling/TilingI.pm =================================================================== --- bioperl-dev/trunk/Bio/Search/Tiling/TilingI.pm 2009-05-23 01:02:38 UTC (rev 15707) +++ bioperl-dev/trunk/Bio/Search/Tiling/TilingI.pm 2009-05-23 03:51:36 UTC (rev 15708) @@ -18,14 +18,39 @@ =head1 SYNOPSIS -Not used directly. +Not used directly. Useful POD here for developers, however. +The interface is desgined to make the following code conversion as +simple as possible: + +From: + + # Bio::Search::SearchUtils-based + while ( local $_ = $result->next_hit ) { + printf( "E-value: %g; Fraction aligned: %f; Number identical: %d\n", + $hit->significance, $hit->frac_aligned_query, $hit->num_identical); + } + +To: + + # TilingI-based + while ( local $_ = $result->next_hit ) { + my $tiling = Bio::Search::Tiling::MyTiling($_); + printf( "E-value: %g; Fraction aligned: %f; Number identical: %d\n", + $hit->significance, $tiling->frac_aligned_query, $tiling->num_identical); + } + + + =head1 DESCRIPTION This module provides strong suggestions for any intended HSP tiling object implementation. An object subclassing TilingI should override the methods defined here according to their descriptions below. +See the section STATISTICS METHODS for hints on implementing methods +that are valid across different algorithms and report types. + =head1 FEEDBACK =head2 Mailing Lists @@ -79,47 +104,65 @@ use base qw(Bio::Root::Root); -=head2 next_tiling +=head2 STATISTICS METHODS - Title : next_tiling - Usage : @hsps = $self->next_tiling($type); - Function: Obtain a tiling of HSPs over the $type ('hit', 'subject', - 'query') sequence - Example : - Returns : an array of HSPI objects - Args : scalar $type: one of 'hit', 'subject', 'query', with - 'subject' an alias for 'hit' +The tiling statistics can be thought of as global counterparts to +similar statistics defined for the individual HSPs. We therefore +prescribe definitions for many of the synonymous methods defined in +L. -=cut +The tiling statistics must be able to keep track of the coordinate +systems in which both the query and subject sequences exist; i.e., +either nucleotide or amino acid. This information is typically +inferred from the name of the algorithm used to perform the original +search (contained in C<$hit_object-Ealgorithm>). Here is a table +of algorithm information that may be useful (if you trust us). -sub next_tiling{ - my ($self,$type, at args) = @_; - $self->throw_not_implemented; -} + algorithm query on hit coordinates(q/h) + --------- ------------ --------------- + blastn dna on dna dna/dna + blastp aa on aa aa/aa + blastx xna on aa dna/aa + tblastn aa on xna aa/dna + tblastx xna on xna dna/dna + fasta dna on dna dna/dna + fasta aa on aa aa/aa + fastx xna on aa dna/aa + fasty xna on aa dna/aa + tfasta aa on xna aa/dna + tfasty aa on xna aa/dna + megablast dna on dna dna/dna -=head2 rewind_tilings + xna: translated nucleotide data - Title : rewind_tilings - Usage : $self->rewind_tilings($type) - Function: Reset the next_tilings($type) iterator - Example : - Returns : True on success - Args : scalar $type: one of 'hit', 'subject', 'query', with - 'subject' an alias for 'hit' +Statistics methods must also be aware of differences in reporting +among the algorithms. Hit attributes are not necessarily normalized +over all algorithms. Devs, please feel free to add examples to the +list below. -=cut +=over -sub rewind_tilings{ - my ($self, $type, @args) = @_; - $self->throw_not_implemented; -} +=item NCBI BLAST vs WU-BLAST (AB-BLAST) lengths -#alias -sub rewind { shift->rewind_tilings(@_) } +The total length of the alignment is reported differently between these two flavors. C<$hit_object-Elength()> will contain the number in the denominator of the stats line; i.e., 120 in + Identical = 34/120 Positives = 67/120 + +NCBI BLAST uses the total length of the query sequence as input by the user (a.k.a. "with gaps"). WU-BLAST uses the length of the query sequence actually aligned by the algorithm (a.k.a. "without gaps"). + +=back + +Finally, developers should remember that sequence data may or may not +be associated with the HSPs contained in the hit object. This will +typically depend on whether a full report (e.g, C) or a +summary (e.g., C) was parsed. Statistics methods that +depend directly on the sequence data will need to check that +that data is present. + =head2 identities Title : identities + Alias : num_identical Usage : $num_identities = $tiling->identities() Function: Return the estimated or exact number of identities in the tiling, accounting for overlapping HSPs @@ -134,9 +177,13 @@ $self->throw_not_implemented; } +#HSPI synonym +sub num_identical { shift->identities( @_ ) } + =head2 conserved Title : conserved + Alias : num_conserved Usage : $num_conserved = $tiling->conserved() Function: Return the estimated or exact number of conserved sites in the tiling, accounting for overlapping HSPs @@ -151,14 +198,16 @@ $self->throw_not_implemented; } +#HSPI synonym +sub num_conserved { sub shift->conserved( @_ ) } + =head2 length Title : length Usage : $max_length = $tiling->length($type) Function: Return the total number of residues of the subject or query sequence covered by the tiling - Example : - Returns : + Returns : number of "raw" residues covered (see logical_length() ) Args : scalar $type, one of 'hit', 'subject', 'query' =cut @@ -168,9 +217,172 @@ $self->throw_not_implemented; } +=head2 frac_identical + + Title : frac_identical + Usage : $tiling->frac_identical($type) + Function: Return the fraction of sequence length consisting + of identical pairs + Returns : scalar float + Args : scalar $type, one of 'hit', 'subject', 'query' + Note : This method must take account of the $type coordinate + system and the length reporting method (see STATISTICS + METHODS above) -# -# more desired methods here as nec -# +=cut +sub frac_identical { + my ($self, $type, @args) = @_; + $self->throw_not_implemented; +} + +=head2 percent_identity + + Title : percent_identity + Usage : $tiling->percent_identity($type) + Function: Return the fraction of sequence length consisting + of identical pairs as a percentage + Returns : scalar float + Args : scalar $type, one of 'hit', 'subject', 'query' + +=cut + +sub percent_identity { + my ($self, $type, @args) = @_; + return $self->frac_identical($type, @args) * 100; +} + +=head2 frac_conserved + + Title : frac_conserved + Usage : $tiling->frac_conserved($type) + Function: Return the fraction of sequence length consisting + of conserved pairs + Returns : scalar float + Args : scalar $type, one of 'hit', 'subject', 'query' + Note : This method must take account of the $type coordinate + system and the length reporting method (see STATISTICS + METHODS above) + +=cut + +sub frac_conserved{ + my ($self, $type, @args) = @_; + $self->throw_not_implemented; +} + +=head2 percent_conserved + + Title : percent_conserved + Usage : $tiling->percent_conserved($type) + Function: Return the fraction of sequence length consisting + of conserved pairs as a percentage + Returns : scalar float + Args : scalar $type, one of 'hit', 'subject', 'query' + +=cut + +sub percent_conserved { + my ($self, $type, @args) = @_; + return $self->frac_conserved($type, @args) * 100; +} + + +=head2 frac_aligned + + Title : frac_aligned + Usage : $tiling->frac_aligned($type) + Function: Return the fraction of B sequence length consisting + that was aligned by the algorithm + Returns : scalar float + Args : scalar $type, one of 'hit', 'subject', 'query' + Note : This method must take account of the $type coordinate + system and the length reporting method (see STATISTICS + METHODS above) + +=cut + +sub frac_aligned{ + my ($self, $type, @args) = @_; + $self->throw_not_implemented; +} + +# aliases for back compat +sub frac_aligned_query { shift->frac_aligned('query', @_) } +sub frac_aligned_hit { shift->frac_aligned('hit', @_) } + +=head2 range + + Title : range + Usage : $tiling->range($type) + Function: Returns the extent of the longest tiling + as ($start_coord, $end_coord) + Returns : array of two scalar integers + Args : scalar $type, one of 'hit', 'subject', 'query' + +=cut + +sub range { + my ($self, $type, @args) = @_; + $self->throw_not_implemented; +} + +=head2 logical_length + + Title : logical_length + Usage : $tiling->logical_length($type) + Function: Get the logical length of the hit sequence, + i.e., the length of the pretranslated nucleotide + sequence if necessary. + Returns : scalar integer + Argument: scalar $type, one of 'hit', 'subject', 'query' + Comments : This is a key internal function for the frac_* methods. + +=cut + +sub logical_length{ + my ($self, $type, @args) = @_; + $self->throw_not_implemented; +} + +=head2 TILING ITERATORS + +=head2 next_tiling + + Title : next_tiling + Usage : @hsps = $self->next_tiling($type); + Function: Obtain a tiling of HSPs over the $type ('hit', 'subject', + 'query') sequence + Example : + Returns : an array of HSPI objects + Args : scalar $type: one of 'hit', 'subject', 'query', with + 'subject' an alias for 'hit' + +=cut + +sub next_tiling{ + my ($self,$type, at args) = @_; + $self->throw_not_implemented; +} + +=head2 rewind_tilings + + Title : rewind_tilings + Usage : $self->rewind_tilings($type) + Function: Reset the next_tilings($type) iterator + Example : + Returns : True on success + Args : scalar $type: one of 'hit', 'subject', 'query', with + 'subject' an alias for 'hit' + +=cut + +sub rewind_tilings{ + my ($self, $type, @args) = @_; @@ Diff output truncated at 10000 characters. @@ From maj at dev.open-bio.org Mon May 25 09:57:21 2009 From: maj at dev.open-bio.org (Mark Allen Jensen) Date: Mon, 25 May 2009 09:57:21 -0400 Subject: [Bioperl-guts-l] [15709] bioperl-dev/trunk/Bio/Search/Tiling: Working out coordinate system issues (dna v. Message-ID: <200905251357.n4PDvLY0032739@dev.open-bio.org> Revision: 15709 Author: maj Date: 2009-05-25 09:57:20 -0400 (Mon, 25 May 2009) Log Message: ----------- Working out coordinate system issues (dna v. xna v. aa) Filling out interface+instantiated methods: frac_*, num_* Small refactors Modified Paths: -------------- bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm bioperl-dev/trunk/Bio/Search/Tiling/TilingI.pm Modified: bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm =================================================================== --- bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm 2009-05-23 03:51:36 UTC (rev 15708) +++ bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm 2009-05-25 13:57:20 UTC (rev 15709) @@ -29,6 +29,9 @@ $subject_length = $tiling->length('subject'); # or... $subject_length = $tiling->length('hit'); + # get a visual on the coverage map + print $tiling->coverage_map_as_text('query','LEGEND'); + # tilings @covering_hsps_for_subject = $tiling->next_tiling('subject'); @covering_hsps_for_query = $tiling->next_tiling('query'); @@ -49,8 +52,18 @@ interval decomposition I'm calling the "coverage map". Internal object methods compute the various statistics, which are then stored in appropriately-named public object attributes. See -L for more info on the algorithm. +L for more info on the algorithm. +=head1 DESIGN NOTE + +The major calculations are made just-in-time, and then memoized. So, +for example, for a given MapTiling object, a coverage map would +usually be calculated only once (for the query), and at most twice (if +the subject perspective is also desired), and then only when a +statistic is first accessed. Afterward, the map and/or any statistic +is read from storage. So feel free to call the statistic methods +frequently if it suits you. + =head1 FEEDBACK =head2 Mailing Lists @@ -109,27 +122,35 @@ use base qw(Bio::Root::Root Bio::Search::Tiling::TilingI); # fast, clear, nasty, brutish and short. -# for _allowable_filters() +# for _allowable_filters(), _set_mapping() # covers BLAST, FAST families # FASTA is ambiguous (nt or aa) based on alg name only -my $filter_lookup = { +my $alg_lookup = { 'N' => { 'q' => qr/[s]/, - 'h' => qr/[s]/ }, + 'h' => qr/[s]/, + 'mapping' => [1,1]}, 'P' => { 'q' => '', - 'h' => '' }, + 'h' => '', + 'mapping' => [1,1] }, 'X' => { 'q' => qr/[sf]/, - 'h' => '' }, + 'h' => '', + 'mapping' => [3, 1]}, 'Y' => { 'q' => qr/[sf]/, - 'h' => '' }, + 'h' => '', + 'mapping' => [3, 1]}, 'TA' => { 'q' => '', - 'h' => qr/[sf]/ }, + 'h' => qr/[sf]/, + 'mapping' => [1, 3]}, 'TN' => { 'q' => '', - 'h' => qr/[sf]/ }, + 'h' => qr/[sf]/, + 'mapping' => [1, 3]}, 'TX' => { 'q' => qr/[sf]/, - 'h' => qr/[sf]/ }, + 'h' => qr/[sf]/, + 'mapping' => [3, 3]}, # correct? 'TY' => { 'q' => qr/[sf]/, - 'h' => qr/[sf]/ } + 'h' => qr/[sf]/, + 'mapping' => [3, 3]} }; @@ -145,7 +166,12 @@ filtering args for nucleotide data: -qstrand => [[ 1 | -1 ]] -hstrand => [[ 1 | -1 ]] - (frame specs to come, hopefully) + -qframe => [[ -2 | -1 | 0 | 1 | 2 ]] + -hframe => [[ -2 | -1 | 0 | 1 | 2 ]] + Note : Not all filters are valid for all BLAST/FAST + algorithms. The constructor will warn when, + e.g., -qstrand is set for BLASTP data. + =cut @@ -170,6 +196,7 @@ } $self->warn("No HSPs present in hit after filtering") unless (@hsps); $self->hsps(\@hsps); + $self->_set_mapping(); $self->{"strand_query"} = $qstrand; $self->{"strand_hit"} = $hstrand; $self->{"frame_query"} = $qframe; @@ -221,7 +248,7 @@ return $self->_tiling_iterator($type)->('REWIND'); } -=head2 ACCESSORS +=head2 STATISTICS =head2 identities @@ -241,8 +268,7 @@ my $self = shift; my ($type, $action) = @_; $self->_check_type_arg(\$type); - $action ||= 'exact'; - $self->throw("Unknown action '$action'") unless grep(/^$action$/, qw( exact est max )); + $self->_check_action_arg(\$action); if (!defined $self->{"identities_${type}_${action}"}) { $self->_calc_stats($type, $action); } @@ -267,8 +293,7 @@ my $self = shift; my ($type, $action) = @_; $self->_check_type_arg(\$type); - $action ||= 'exact'; - $self->throw("Unknown action '$action'") unless grep(/^$action$/, qw( exact est max )); + $self->_check_action_arg(\$action); if (!defined $self->{"conserved_${type}_${action}"}) { $self->_calc_stats($type, $action); } @@ -279,7 +304,8 @@ Title : length Usage : $tiling->length($type, $action) - Function: Retrieve the total length in residues for the invocant + Function: Retrieve the total length of aligned residues for + the seq $type Example : Returns : value of length (a scalar) Args : scalar $type: one of 'hit', 'subject', 'query' @@ -294,15 +320,211 @@ my $self = shift; my ($type,$action) = @_; $self->_check_type_arg(\$type); - - $action ||= 'exact'; - $self->throw("Unknown action '$action'") unless grep(/^$action$/, qw( exact est max )); + $self->_check_action_arg(\$action); if (!defined $self->{"length_${type}_${action}"}) { $self->_calc_stats($type, $action); } return $self->{"length_${type}_${action}"}; } +=head2 frac_identical + + Title : frac_identical + Usage : $tiling->frac_identical($type, $denom) + Function: Return the fraction of sequence length consisting + of identical pairs, with respect to $denom + Returns : scalar float + Args : scalar $type, one of 'hit', 'subject', 'query' + scalar $denom, one of 'total', 'aligned' + Note : $denom == 'aligned', return identities/num_aligned + $denom == 'total', return identities/_reported_length + (i.e., length of the original input sequences) + +=cut + +sub frac_identical { + my ($self, $type, $denom) = @_; + if (@_ == 1) { + _check_type_arg(\$type); # set default + $denom = 'total'; # is this the right default? + } + elsif (@_ == 2) { + if (grep /^$type$/, qw( query hit subject )) { + $denom = 'total'; + } + elsif (grep /^$type$/, qw( total aligned )) { + $denom = $type; + $type = ''; + _check_type_arg(\$type); # set default + } + else { + $self->throw("Can't understand argument '$type'"); + } + } + else { + _check_type_arg(\$type); + unless (grep /^$denom/, qw( total aligned )) { + $self->throw("Denominator selection must be one of ('total', 'aligned'), not '$denom'"); + } + } + if (!defined $self->{"frac_identical_${type}_${denom}"}) { + for ($denom) { + /total/ && do { + return $self->{"frac_identical_${type}_${denom}"} = + $self->identities($type)/$self->length($type); + }; + /aligned/ && do { + return $self->{"frac_identical_${type}_${denom}"} = + $self->identities($type)/$self->_reported_length($type); + }; + do { + $self->throw("What are YOU doing here?"); + }; + } + } +} + +=head2 frac_conserved + + Title : frac_conserved + Usage : $tiling->frac_conserved($type, $denom) + Function: Return the fraction of sequence length consisting + of conserved pairs, with respect to $denom + Returns : scalar float + Args : scalar $type, one of 'hit', 'subject', 'query' + scalar $denom, one of 'total', 'aligned' + Note : $denom == 'aligned', return conserved/num_aligned + $denom == 'total', return conserved/_reported_length + (i.e., length of the original input sequences) + +=cut + +sub frac_conserved{ + my ($self, $type, $denom) = @_; + if (@_ == 1) { + _check_type_arg(\$type); # set default + $denom = 'total'; # is this the right default? + } + elsif (@_ == 2) { + if (grep /^$type$/, qw( query hit subject )) { + $denom = 'total'; + } + elsif (grep /^$type$/, qw( total aligned )) { + $denom = $type; + $type = ''; + _check_type_arg(\$type); # set default + } + else { + $self->throw("Can't understand argument '$type'"); + } + } + else { + _check_type_arg(\$type); + unless (grep /^$denom/, qw( total aligned )) { + $self->throw("Denominator selection must be one of ('total', 'aligned'), not '$denom'"); + } + } + if (!defined $self->{"frac_conserved_${type}_${denom}"}) { + for ($denom) { + /total/ && do { + return $self->{"frac_conserved_${type}_${denom}"} = + $self->conserved($type)/$self->length($type); + }; + /aligned/ && do { + return $self->{"frac_conserved_${type}_${denom}"} = + $self->conserved($type)/$self->_reported_length($type); + }; + do { + $self->throw("What are YOU doing here?"); + }; + } + } +} + +=head2 frac_aligned + + Title : frac_aligned + Usage : $tiling->frac_aligned($type) + Function: Return the fraction of input sequence length + that was aligned by the algorithm + Returns : scalar float + Args : scalar $type, one of 'hit', 'subject', 'query' + +=cut + +sub frac_aligned{ + my ($self, $type, @args) = @_; + _check_type_arg(\$type); + if (!$self->{"frac_aligned_${type}"}) { + $self->{"frac_aligned_${type}"} = $self->num_aligned($type)/$self->_reported_length($type); + } + return $self->{"frac_aligned_${type}"}; +} + +=head2 num_aligned + + Title : num_aligned + Usage : $tiling->num_aligned($type) + Function: Return the number of residues of sequence $type + that were aligned by the algorithm + Returns : scalar int + Args : scalar $type, one of 'hit', 'subject', 'query' + Note : Since this is calculated from reported coordinates, + not symbol string counts, it is already in terms of + "logical length" + +=cut + +sub num_aligned { shift->length( @_ ) }; + +=head2 num_unaligned + + Title : num_unaligned + Usage : $tiling->num_unaligned($type) + Function: Return the number of residues of sequence $type + that were left unaligned by the algorithm + Returns : scalar int + Args : scalar $type, one of 'hit', 'subject', 'query' + Note : Since this is calculated from reported coordinates, + not symbol string counts, it is already in terms of + "logical length" + +=cut + +sub num_unaligned { + my $self = shift; + my $type = shift; + my $ret; + _check_type_arg(\$type); + if (!defined $self->{"num_unaligned_${type}"}) { @@ Diff output truncated at 10000 characters. @@ From maj at dev.open-bio.org Mon May 25 17:46:23 2009 From: maj at dev.open-bio.org (Mark Allen Jensen) Date: Mon, 25 May 2009 17:46:23 -0400 Subject: [Bioperl-guts-l] [15710] bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm: bug-swatting Message-ID: <200905252146.n4PLkNvY001126@dev.open-bio.org> Revision: 15710 Author: maj Date: 2009-05-25 17:46:23 -0400 (Mon, 25 May 2009) Log Message: ----------- bug-swatting Modified Paths: -------------- bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm Modified: bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm =================================================================== --- bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm 2009-05-25 13:57:20 UTC (rev 15709) +++ bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm 2009-05-25 21:46:23 UTC (rev 15710) @@ -345,7 +345,8 @@ sub frac_identical { my ($self, $type, $denom) = @_; if (@_ == 1) { - _check_type_arg(\$type); # set default + $type = ''; + $self->_check_type_arg(\$type); # set default $denom = 'total'; # is this the right default? } elsif (@_ == 2) { @@ -355,14 +356,14 @@ elsif (grep /^$type$/, qw( total aligned )) { $denom = $type; $type = ''; - _check_type_arg(\$type); # set default + $self->_check_type_arg(\$type); # set default } else { $self->throw("Can't understand argument '$type'"); } } else { - _check_type_arg(\$type); + $self->_check_type_arg(\$type); unless (grep /^$denom/, qw( total aligned )) { $self->throw("Denominator selection must be one of ('total', 'aligned'), not '$denom'"); } @@ -370,18 +371,21 @@ if (!defined $self->{"frac_identical_${type}_${denom}"}) { for ($denom) { /total/ && do { - return $self->{"frac_identical_${type}_${denom}"} = - $self->identities($type)/$self->length($type); + $self->{"frac_identical_${type}_${denom}"} = + $self->identities($type)/$self->_reported_length($type); + last; }; /aligned/ && do { - return $self->{"frac_identical_${type}_${denom}"} = - $self->identities($type)/$self->_reported_length($type); + $self->{"frac_identical_${type}_${denom}"} = + $self->identities($type)/$self->length($type); + last; }; do { $self->throw("What are YOU doing here?"); }; } } + return $self->{"frac_identical_${type}_${denom}"}; } =head2 frac_conserved @@ -402,7 +406,8 @@ sub frac_conserved{ my ($self, $type, $denom) = @_; if (@_ == 1) { - _check_type_arg(\$type); # set default + $type = ''; + $self->_check_type_arg(\$type); # set default $denom = 'total'; # is this the right default? } elsif (@_ == 2) { @@ -412,14 +417,14 @@ elsif (grep /^$type$/, qw( total aligned )) { $denom = $type; $type = ''; - _check_type_arg(\$type); # set default + $self->_check_type_arg(\$type); # set default } else { $self->throw("Can't understand argument '$type'"); } } else { - _check_type_arg(\$type); + $self->_check_type_arg(\$type); unless (grep /^$denom/, qw( total aligned )) { $self->throw("Denominator selection must be one of ('total', 'aligned'), not '$denom'"); } @@ -427,18 +432,22 @@ if (!defined $self->{"frac_conserved_${type}_${denom}"}) { for ($denom) { /total/ && do { - return $self->{"frac_conserved_${type}_${denom}"} = - $self->conserved($type)/$self->length($type); + $self->{"frac_conserved_${type}_${denom}"} = + $self->conserved($type)/$self->_reported_length($type); + last; }; /aligned/ && do { - return $self->{"frac_conserved_${type}_${denom}"} = - $self->conserved($type)/$self->_reported_length($type); + $self->{"frac_conserved_${type}_${denom}"} = + $self->conserved($type)/$self->length($type); + last; }; do { $self->throw("What are YOU doing here?"); + last; }; } } + return $self->{"frac_conserved_${type}_${denom}"}; } =head2 frac_aligned @@ -454,7 +463,7 @@ sub frac_aligned{ my ($self, $type, @args) = @_; - _check_type_arg(\$type); + $self->_check_type_arg(\$type); if (!$self->{"frac_aligned_${type}"}) { $self->{"frac_aligned_${type}"} = $self->num_aligned($type)/$self->_reported_length($type); } @@ -495,7 +504,7 @@ my $self = shift; my $type = shift; my $ret; - _check_type_arg(\$type); + $self->_check_type_arg(\$type); if (!defined $self->{"num_unaligned_${type}"}) { $self->{"num_unaligned_${type}"} = $self->_reported_length($type)-$self->num_aligned($type); } @@ -516,9 +525,9 @@ sub range { my ($self, $type, @args) = @_; - _check_type_arg(\$type); + $self->_check_type_arg(\$type); my @a = $self->_contig_intersection($type); - return ($a[0]->[0], $a[-1]->[1]); + return ($a[0]->[0][0], $a[-1]->[0][1]); } @@ -692,7 +701,7 @@ sub mapping{ my $self = shift; my $type = shift; - _check_type_arg(\$type); + $self->_check_type_arg(\$type); return $self->{"_mapping_${type}"}; } @@ -1129,7 +1138,7 @@ if (!defined $self->{"_contig_intersection_${type}"}) { $self->_calc_coverage_map($type); } - return $self->{"_contig_intersection_${type}"}; + return @{$self->{"_contig_intersection_${type}"}}; } =head2 _reported_length @@ -1154,7 +1163,7 @@ sub _reported_length { my $self = shift; my $type = shift; - _check_type_arg(\$type); + $self->_check_type_arg(\$type); my $key = uc( $type."_LENGTH" ); return ($self->hsps)[0]->{$key}; } From maj at dev.open-bio.org Tue May 26 23:42:46 2009 From: maj at dev.open-bio.org (Mark Allen Jensen) Date: Tue, 26 May 2009 23:42:46 -0400 Subject: [Bioperl-guts-l] [15711] bioperl-dev/trunk/t/SearchIO: blast. t modified to use MapTiling in place of Message-ID: <200905270342.n4R3gk4Q011802@dev.open-bio.org> Revision: 15711 Author: maj Date: 2009-05-26 23:42:45 -0400 (Tue, 26 May 2009) Log Message: ----------- blast.t modified to use MapTiling in place of hit object calls dependent on SearchUtils (2 extra tests correspond to two add'l use_ok()'s Modified Paths: -------------- bioperl-dev/trunk/t/SearchIO/Tiling.t Added Paths: ----------- bioperl-dev/trunk/t/SearchIO/blast-MapTiling.t Modified: bioperl-dev/trunk/t/SearchIO/Tiling.t =================================================================== --- bioperl-dev/trunk/t/SearchIO/Tiling.t 2009-05-25 21:46:23 UTC (rev 15710) +++ bioperl-dev/trunk/t/SearchIO/Tiling.t 2009-05-27 03:42:45 UTC (rev 15711) @@ -3,6 +3,7 @@ use strict; BEGIN { use lib '.'; + use lib '../..'; use Bio::Root::Test; test_begin(-tests => 1000 ); } @@ -13,6 +14,7 @@ use_ok('Bio::Search::Hit::BlastHit'); use_ok('File::Spec'); +chdir('../..'); ok( my $parser = new Bio::SearchIO( -file=>test_input_file('dcr1_sp.WUBLASTP'), @@ -104,17 +106,46 @@ 'TFASTX' => undef ); +my %results; + foreach (keys %examples) { next unless $examples{$_}; ok( my $blio = Bio::SearchIO->new( -format=>$examples{$_}[0], -file =>test_input_file($examples{$_}[1])), "$_ data file"); - my $hit = $blio->next_result->next_hit; + my $hit = ($results{$_} = $blio->next_result)->next_hit; ok( $tiling = Bio::Search::Tiling::MapTiling->new($hit, @{$examples{$_}[2]}), "tiling object created for $_ hit"); dies_ok { Bio::Search::Tiling::MapTiling->new($hit, @{$examples{$_}[3]}) } "tiling object arg exception check for $_ hit"; 1; } +# tricky wu-blast +ok (my $blio = Bio::SearchIO->new( -format=>'blast', + -file=>test_input_file('tricky.wublast')), + 'tricky.wublast') +ok( $tiling = Bio::Search::Tiling::MapTiling->new($blio->next_result->next_hit), 'tricky tiling'); +my @map = $tiling->coverage_map_as_text('query',1); + at map = $tiling->coverage_map_as_text('hit',1); +ok (my $blio = Bio::SearchIO->new( -format=>'blast', + -file=>test_input_file('frac_problems.blast')), + 'frac_problems.blast') +ok( $tiling = Bio::Search::Tiling::MapTiling->new($blio->next_result->next_hit), 'frac_problems tiling'); + +ok (my $blio = Bio::SearchIO->new( -format=>'blast', + -file=>test_input_file('frac_problems.blast')), + 'frac_problems2.blast') +ok( $tiling = Bio::Search::Tiling::MapTiling->new($blio->next_result->next_hit), 'frac_problems2 tiling'); + +ok (my $blio = Bio::SearchIO->new( -format=>'blast', + -file=>test_input_file('frac_problems.blast')), + 'frac_problems3.blast') +ok( $tiling = Bio::Search::Tiling::MapTiling->new($blio->next_result->next_hit), 'frac_problems3 tiling'); + +# old blast.t tiling tests + + +1; + Added: bioperl-dev/trunk/t/SearchIO/blast-MapTiling.t =================================================================== --- bioperl-dev/trunk/t/SearchIO/blast-MapTiling.t (rev 0) +++ bioperl-dev/trunk/t/SearchIO/blast-MapTiling.t 2009-05-27 03:42:45 UTC (rev 15711) @@ -0,0 +1,1640 @@ +# -*-Perl-*- Test Harness script for Bioperl +# $Id: SearchIO_blast.t 14995 2008-11-16 06:20:00Z cjfields $ + +# convert to use MapTiling.t / maj + +use strict; +#chdir('../..'); +BEGIN { + use lib '.'; +# use lib '../..'; + use Bio::Root::Test; + + test_begin(-tests => 1095); # 1093 + two use_ok's + + use_ok('Bio::SearchIO'); + use_ok('Bio::Search::Tiling::MapTiling'); + use_ok('Bio::Search::Tiling::MapTileUtils'); + +} + +my ($searchio, $result,$iter,$hit,$hsp); + +my $tiling; + +$searchio = Bio::SearchIO->new('-format' => 'blast', + '-file' => test_input_file('ecolitst.bls')); + +$result = $searchio->next_result; + +is($result->database_name, 'ecoli.aa', 'database_name()'); +is($result->database_entries, 4289); +is($result->database_letters, 1358990); + +is($result->algorithm, 'BLASTP'); +like($result->algorithm_version, qr/^2\.1\.3/); +like($result->query_name, qr/gi|1786183|gb|AAC73113.1| (AE000111) aspartokinase I,\s+homoserine dehydrogenase I [Escherichia coli]/); +is($result->query_accession, 'AAC73113.1'); +is($result->query_gi, 1786183); +is($result->query_length, 820); +is($result->get_statistic('kappa'), '0.135'); +is($result->get_statistic('kappa_gapped'), '0.0410'); +is($result->get_statistic('lambda'), '0.319'); +is($result->get_statistic('lambda_gapped'), '0.267'); +is($result->get_statistic('entropy'), '0.383'); +is($result->get_statistic('entropy_gapped'), '0.140'); + +is($result->get_statistic('dbletters'), 1358990); +is($result->get_statistic('dbentries'), 4289); +is($result->get_statistic('effective_hsplength'), 47); +is($result->get_statistic('effectivespace'), 894675611); +is($result->get_parameter('matrix'), 'BLOSUM62'); +is($result->get_parameter('gapopen'), 11); +is($result->get_parameter('gapext'), 1); +is($result->get_statistic('S2'), '92'); +is($result->get_statistic('S2_bits'), '40.0'); +float_is($result->get_parameter('expect'), '1.0e-03'); +is($result->get_statistic('num_extensions'), '82424'); + + +my @valid = ( [ 'gb|AAC73113.1|', 820, 'AAC73113', '0', 1567, 4058], + [ 'gb|AAC76922.1|', 810, 'AAC76922', '1e-91', 332, 850], + [ 'gb|AAC76994.1|', 449, 'AAC76994', '3e-47', 184, 467]); +my $count = 0; +while( $hit = $result->next_hit ) { + my $d = shift @valid; + + is($hit->name, shift @$d); + is($hit->length, shift @$d); + is($hit->accession, shift @$d); + float_is($hit->significance, shift @$d); + is($hit->bits, shift @$d ); + is($hit->raw_score, shift @$d ); + + if( $count == 0 ) { + my $hsps_left = 1; + while( my $hsp = $hit->next_hsp ) { + is($hsp->query->start, 1); + is($hsp->query->end, 820); + is($hsp->hit->start, 1); + is($hsp->hit->end, 820); + is($hsp->length('total'), 820); + is($hsp->start('hit'), $hsp->hit->start); + is($hsp->end('query'), $hsp->query->end); + is($hsp->strand('sbjct'), $hsp->subject->strand);# alias for hit + float_is($hsp->evalue, 0.0); + is($hsp->score, 4058); + is($hsp->bits,1567); + is(sprintf("%.2f",$hsp->percent_identity), 98.29); + is(sprintf("%.4f",$hsp->frac_identical('query')), 0.9829); + is(sprintf("%.4f",$hsp->frac_identical('hit')), 0.9829); + is($hsp->gaps, 0); + $hsps_left--; + } + is($hsps_left, 0); + } + last if( $count++ > @valid ); +} +is(@valid, 0); + +$searchio = Bio::SearchIO->new('-format' => 'blast', + '-file' => test_input_file('ecolitst.wublastp')); + +$result = $searchio->next_result; + +is($result->database_name, 'ecoli.aa'); +is($result->database_letters, 1358990); +is($result->database_entries, 4289); +is($result->algorithm, 'BLASTP'); +like($result->algorithm_version, qr/^2\.0MP\-WashU/); +like($result->query_name, qr/gi|1786183|gb|AAC73113.1| (AE000111) aspartokinase I,\s+homoserine dehydrogenase I [Escherichia coli]/); +is($result->query_accession, 'AAC73113.1'); + +is($result->query_length, 820); +is($result->query_gi, 1786183); +is($result->get_statistic('kappa'), 0.136); +is($result->get_statistic('lambda'), 0.319); +is($result->get_statistic('entropy'), 0.384); +is($result->get_statistic('dbletters'), 1358990); +is($result->get_statistic('dbentries'), 4289); +is($result->get_parameter('matrix'), 'BLOSUM62'); +is($result->get_statistic('Frame+0_lambda_used'), '0.319'); +is($result->get_statistic('Frame+0_kappa_used'), '0.136'); +is($result->get_statistic('Frame+0_entropy_used'), '0.384'); + +is($result->get_statistic('Frame+0_lambda_computed'), '0.319'); +is($result->get_statistic('Frame+0_kappa_computed'), '0.136'); +is($result->get_statistic('Frame+0_entropy_computed'), '0.384'); + +is($result->get_statistic('Frame+0_lambda_gapped'), '0.244'); +is($result->get_statistic('Frame+0_kappa_gapped'), '0.0300'); +is($result->get_statistic('Frame+0_entropy_gapped'), '0.180'); + + at valid = ( [ 'gb|AAC73113.1|', 820, 'AAC73113', '0', 4141], + [ 'gb|AAC76922.1|', 810, 'AAC76922', '3.1e-86', 844], + [ 'gb|AAC76994.1|', 449, 'AAC76994', '2.8e-47', 483]); +$count = 0; + +#MT BEGIN +while( $hit = $result->next_hit ) { + my $d = shift @valid; + my $tiling = Bio::Search::Tiling::MapTiling->new($hit); + if ($count==1) { + # Test HSP contig data returned by SearchUtils::tile_hsps() + # Second hit has two hsps that overlap. + + # compare with the contig made by hand for these two contigs + # in t/data/contig-by-hand.wublastp + # (in this made-up file, the hsps from ecolitst.wublastp + # were aligned and contiged, and Length, Identities, Positives + # were counted, by a human (maj) ) + + my $hand_hit = Bio::SearchIO->new( + -format=>'blast', + -file=>test_input_file('contig-by-hand.wublastp') + )->next_result->next_hit; + my $hand_hsp = $hand_hit->next_hsp; + my @hand_qrng = $hand_hsp->range('query'); + my @hand_srng = $hand_hsp->range('hit'); + my @hand_matches = $hand_hit->matches; + +# my($qcontigs, $scontigs) = Bio::Search::SearchUtils::tile_hsps($hit); +# # Query contigs +# is($qcontigs->[0]->{'start'}, $hand_qrng[0]); +# is($qcontigs->[0]->{'stop'}, $hand_qrng[1]); +# is($qcontigs->[0]->{'iden'}, $hand_matches[0]); +# is($qcontigs->[0]->{'cons'}, $hand_matches[1]); +# # Subject contigs +# is($scontigs->[0]->{'start'}, $hand_srng[0]); +# is($scontigs->[0]->{'stop'}, $hand_srng[1]); +# is($scontigs->[0]->{'iden'}, $hand_matches[0]); +# is($scontigs->[0]->{'cons'}, $hand_matches[1]); + + is(($tiling->range('query'))[0], $hand_qrng[0]); + is(($tiling->range('query'))[1], $hand_qrng[1]); + is(sprintf("%d",$tiling->identities('query')), $hand_matches[0]); + is(sprintf("%d",$tiling->conserved('query')), $hand_matches[1]); + is(($tiling->range('hit'))[0], $hand_srng[0]); + is(($tiling->range('hit'))[1], $hand_srng[1]); + is(sprintf("%d",$tiling->identities('hit')), $hand_matches[0]); + is(sprintf("%d",$tiling->conserved('hit')), $hand_matches[1]); + } +#MT END + is($hit->name, shift @$d); + is($hit->length, shift @$d); + is($hit->accession, shift @$d); + float_is($hit->significance, shift @$d); + is($hit->raw_score, shift @$d ); + + if( $count == 0 ) { + $hit->rewind; + my $hsps_left = 1; + while( my $hsp = $hit->next_hsp ) { + is($hsp->query->start, 1); + is($hsp->query->end, 820); + is($hsp->hit->start, 1); + is($hsp->hit->end, 820); @@ Diff output truncated at 10000 characters. @@ From maj at dev.open-bio.org Tue May 26 23:45:34 2009 From: maj at dev.open-bio.org (Mark Allen Jensen) Date: Tue, 26 May 2009 23:45:34 -0400 Subject: [Bioperl-guts-l] [15712] bioperl-dev/trunk/Bio/Search/Tiling: now performs correct coordinate mapping for Message-ID: <200905270345.n4R3jYDU011833@dev.open-bio.org> Revision: 15712 Author: maj Date: 2009-05-26 23:45:34 -0400 (Tue, 26 May 2009) Log Message: ----------- now performs correct coordinate mapping for translated dna queries and/or subjects, based on algorithm name- depends on a rewritten HSPI::matches function based on the current HSPI::matches (see MapTileUtils.pm) some refactoring of helper methods- Modified Paths: -------------- bioperl-dev/trunk/Bio/Search/Tiling/MapTileUtils.pm bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm Modified: bioperl-dev/trunk/Bio/Search/Tiling/MapTileUtils.pm =================================================================== --- bioperl-dev/trunk/Bio/Search/Tiling/MapTileUtils.pm 2009-05-27 03:42:45 UTC (rev 15711) +++ bioperl-dev/trunk/Bio/Search/Tiling/MapTileUtils.pm 2009-05-27 03:45:34 UTC (rev 15712) @@ -6,7 +6,12 @@ BEGIN { our @ISA = qw( Exporter ); - our @EXPORT = qw( get_intervals_from_hsps interval_tiling decompose_interval ); + our @EXPORT = qw( get_intervals_from_hsps + interval_tiling + decompose_interval + _allowable_filters + _set_mapping + _mapping_coeff); } # tiling trials @@ -222,4 +227,282 @@ } return @ret; } + +# fast, clear, nasty, brutish and short. +# for _allowable_filters(), _set_mapping() +# covers BLAST, FAST families +# FASTA is ambiguous (nt or aa) based on alg name only + +my $alg_lookup = { + 'N' => { 'q' => qr/[s]/, + 'h' => qr/[s]/, + 'mapping' => [1,1]}, + 'P' => { 'q' => '', + 'h' => '', + 'mapping' => [1,1] }, + 'X' => { 'q' => qr/[sf]/, + 'h' => '', + 'mapping' => [3, 1]}, + 'Y' => { 'q' => qr/[sf]/, + 'h' => '', + 'mapping' => [3, 1]}, + 'TA' => { 'q' => '', + 'h' => qr/[sf]/, + 'mapping' => [1, 3]}, + 'TN' => { 'q' => '', + 'h' => qr/[sf]/, + 'mapping' => [1, 3]}, + 'TX' => { 'q' => qr/[sf]/, + 'h' => qr/[sf]/, + 'mapping' => [3, 3]}, # correct? + 'TY' => { 'q' => qr/[sf]/, + 'h' => qr/[sf]/, + 'mapping' => [3, 3]} +}; + +=head2 _allowable_filters + + Title : _allowable_filters + Usage : _allowable_filters($Bio_Search_Hit_HitI, $type) + Function: Return the HSP filters (strand, frame) allowed, + based on the reported algorithm + Returns : String encoding allowable filters: + s = strand, f = frame + Empty string if no filters allowed + undef if algorithm unrecognized + Args : A Bio::Search::Hit::HitI object, + scalar $type, one of 'hit', 'subject', 'query'; + default is 'query' + +=cut + +sub _allowable_filters { + my $hit = shift; + my $type = shift; + $type ||= 'q'; + unless (grep /^$type$/, qw( h q s ) ) { + warn("Unknown type '$type'; returning ''"); + return ''; + } + $type = 'h' if $type eq 's'; + for ($hit->algorithm) { + /MEGABLAST/i && do { + return qr/[s]/; + }; + /(.?)BLAST(.?)/i && do { + return $$alg_lookup{$1.$2}{$type}; + }; + /(.?)FAST(.?)/ && do { + return $$alg_lookup{$1.$2}{$type}; + }; + do { # unrecognized + last; + }; + } + return; +} + +=head2 _set_mapping + + Title : _set_mapping + Usage : $tiling->_set_mapping() + Function: Sets the "mapping" attribute for invocant + according to algorithm name + Returns : Mapping arrayref as set + Args : none + Note : See mapping() for explanation of this attribute + +=cut + +sub _set_mapping { + my $self = shift; + my $alg = $self->hit->algorithm; + + for ($alg) { + /MEGABLAST/i && do { + ($self->{_mapping_query},$self->{_mapping_hit}) = (1,1); + last; + }; + /(.?)BLAST(.?)/i && do { + ($self->{_mapping_query},$self->{_mapping_hit}) = + @{$$alg_lookup{$1.$2}{mapping}}; + last; + }; + /(.?)FAST(.?)/ && do { + ($self->{_mapping_query},$self->{_mapping_hit}) = + @{$$alg_lookup{$1.$2}{mapping}}; + last; + }; + do { # unrecognized + $self->warn("Unrecognized algorithm '$alg'; returning (1,1)"); + ($self->{_mapping_query},$self->{_mapping_hit}) = (1,1); + last; + }; + } + return ($self->{_mapping_query},$self->{_mapping_hit}); +} + +sub _mapping_coeff { + my $obj = shift; + my $type = shift; + my %type_i = ( 'query' => 0, 'hit' => 1 ); + unless ( ref($obj) && $obj->can('algorithm') ) { + $obj->warn("Object type unrecognized"); + return undef; + } + $type ||= 'query'; + unless ( grep(/^$type$/, qw( query hit subject ) ) ) { + $obj->warn("Sequence type unrecognized"); + return undef; + } + $type = 'hit' if $type eq 'subject'; + + for ($obj->algorithm) { + /MEGABLAST/i && do { + return 1; + }; + /(.?)BLAST(.?)/i && do { + return $$alg_lookup{$1.$2}{'mapping'}[$type_i{$type}]; + }; + /(.?)FAST(.?)/ && do { + return $$alg_lookup{$1.$2}{'mapping'}[$type_i{$type}]; + }; + do { # unrecognized + last; + }; + } + return; +} + 1; +# need our own subsequencer for hsps. + +package Bio::Search::HSP::HSPI; + +use strict; +use warnings; + +=head2 matches_MT + + Title : matches_MT + Usage : $hsp->matches($type, $action, $start, $end) + Purpose : Get the total number of identical or conserved matches + in the query or sbjct sequence for the given HSP. Optionally can + report data within a defined interval along the seq. + Returns : scalar int + Args : + Comments : Relies on seq_str('match') to get the string of alignment symbols + between the query and sbjct lines which are used for determining + the number of identical and conservative matches. + Note : Modeled on Bio::Search::HSP::HSPI::matches + +=cut + +sub matches_MT { + my( $self, @args ) = @_; + my($type, $action, $beg, $end) = $self->_rearrange( [qw(TYPE ACTION START END)], @args); + my @actions = qw( identities conserved searchutils ); + + # prep $type + $self->throw("Type not specified") if !defined $type; + $self->throw("Type '$type' unrecognized") unless grep(/^$type$/,qw(query hit subject)); + $type = 'hit' if $type eq 'subject'; + + # prep $action + $self->throw("Action not specified") if !defined $action; + $self->throw("Action '$action' unrecognized") unless grep(/^$action$/, @actions); + + if ( (!defined($beg) && !defined($end)) ) { + ## Get data for the whole alignment. + for ($action) { + $_ eq 'identities' && do { + return $self->num_identical; + }; + $_ eq 'conserved' && do { + return $self->num_conserved; + }; + $_ eq 'searchutils' && do { + return ($self->num_identical, $self->num_conserved); + }; + do { + $self->throw("What are YOU doing here?"); + }; + } + } + elsif (!$self->seq_str('match')) { + $self->warn("Sequence data not present in report; returning data for entire HSP"); + for ($action) { + $_ eq 'identities' && do { + return $self->num_identical; + }; + $_ eq 'conserved' && do { + return $self->num_conserved; + }; + $_ eq 'searchutils' && do { + return ($self->num_identical, $self->num_conserved); + }; + do { + $self->throw("What are YOU doing here?"); + }; + } + } + elsif ((defined $beg && !defined $end) || (!defined $beg && defined $end)) { + $self->throw("Both start and end are required"); + } + else { + ## Get the substring representing the desired sub-section of aln. + my($start,$stop) = $self->range($type); + if ( $beg < $start or $stop < $end ) { + $self->throw("Start/stop out of range [$start, $stop]"); + } + + # now with gap handling! /maj + my $match_str = $self->seq_str('match'); + if ($self->gaps) { + # strip the homology string of gap positions relative + # to the target type + $match_str = $self->seq_str('match'); + my $tgt = $self->seq_str($type); + my $encode = $match_str ^ $tgt; + my $zap = '-'^' '; + $encode =~ s/$zap//g; + $tgt =~ s/-//g; + $match_str = $tgt ^ $encode; + # match string is now the correct length for substr'ing below, + # given that start and end are gapless coordinates in the + # blast report + } + + my $seq = ""; + $seq = substr( $match_str, + int( ($beg-$start)/Bio::Search::Tiling::MapTileUtils::_mapping_coeff($self, $type) ), + int( ($end-$beg+1)/Bio::Search::Tiling::MapTileUtils::_mapping_coeff($self, $type) ) + ); + + if(!CORE::length $seq) { + $self->throw("Undefined sub-sequence ($beg,$end). Valid range = $start - $stop"); + } + + $seq =~ s/ //g; # remove space (no info). + my $len_cons = (CORE::length $seq)*(Bio::Search::Tiling::MapTileUtils::_mapping_coeff($self,$type)); + $seq =~ s/\+//g; # remove '+' characters (conservative substitutions) + my $len_id = (CORE::length $seq)*(Bio::Search::Tiling::MapTileUtils::_mapping_coeff($self,$type)); + for ($action) { + $_ eq 'identities' && do { + return $len_id; + }; + $_ eq 'conserved' && do { + return $len_cons; + }; + $_ eq 'searchutils' && do { + return ($len_id, $len_cons); + }; + do { + $self->throw("What are YOU doing here?"); + }; + } + } +} + + +1; Modified: bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm =================================================================== --- bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm 2009-05-27 03:42:45 UTC (rev 15711) +++ bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm 2009-05-27 03:45:34 UTC (rev 15712) @@ -121,39 +121,6 @@ # use base qw(Bio::Root::Root Bio::Search::Tiling::TilingI); use base qw(Bio::Root::Root Bio::Search::Tiling::TilingI); -# fast, clear, nasty, brutish and short. -# for _allowable_filters(), _set_mapping() -# covers BLAST, FAST families -# FASTA is ambiguous (nt or aa) based on alg name only - -my $alg_lookup = { - 'N' => { 'q' => qr/[s]/, - 'h' => qr/[s]/, - 'mapping' => [1,1]}, - 'P' => { 'q' => '', - 'h' => '', - 'mapping' => [1,1] }, - 'X' => { 'q' => qr/[sf]/, - 'h' => '', - 'mapping' => [3, 1]}, - 'Y' => { 'q' => qr/[sf]/, - 'h' => '', - 'mapping' => [3, 1]}, - 'TA' => { 'q' => '', - 'h' => qr/[sf]/, - 'mapping' => [1, 3]}, - 'TN' => { 'q' => '', - 'h' => qr/[sf]/, - 'mapping' => [1, 3]}, - 'TX' => { 'q' => qr/[sf]/, - 'h' => qr/[sf]/, - 'mapping' => [3, 3]}, # correct? - 'TY' => { 'q' => qr/[sf]/, - 'h' => qr/[sf]/, - 'mapping' => [3, 3]} -}; - - =head2 CONSTRUCTOR =head2 new @@ -836,20 +803,22 @@ last; }; ($_ eq 'max') && do { - my ($inc_i, $inc_c) = $hsp->matches( - -SEQ => $type, @@ Diff output truncated at 10000 characters. @@ From maj at dev.open-bio.org Tue May 26 23:47:33 2009 From: maj at dev.open-bio.org (Mark Allen Jensen) Date: Tue, 26 May 2009 23:47:33 -0400 Subject: [Bioperl-guts-l] [15713] bioperl-dev/trunk/t/SearchIO/blast-MapTiling.t: keyword subst Message-ID: <200905270347.n4R3lXlH011864@dev.open-bio.org> Revision: 15713 Author: maj Date: 2009-05-26 23:47:33 -0400 (Tue, 26 May 2009) Log Message: ----------- keyword subst Modified Paths: -------------- bioperl-dev/trunk/t/SearchIO/blast-MapTiling.t Property Changed: ---------------- bioperl-dev/trunk/t/SearchIO/blast-MapTiling.t Modified: bioperl-dev/trunk/t/SearchIO/blast-MapTiling.t =================================================================== --- bioperl-dev/trunk/t/SearchIO/blast-MapTiling.t 2009-05-27 03:45:34 UTC (rev 15712) +++ bioperl-dev/trunk/t/SearchIO/blast-MapTiling.t 2009-05-27 03:47:33 UTC (rev 15713) @@ -1,5 +1,5 @@ # -*-Perl-*- Test Harness script for Bioperl -# $Id: SearchIO_blast.t 14995 2008-11-16 06:20:00Z cjfields $ +# $Id$ # convert to use MapTiling.t / maj Property changes on: bioperl-dev/trunk/t/SearchIO/blast-MapTiling.t ___________________________________________________________________ Name: svn:keywords + Id Author Date Rev From maj at dev.open-bio.org Tue May 26 23:50:27 2009 From: maj at dev.open-bio.org (Mark Allen Jensen) Date: Tue, 26 May 2009 23:50:27 -0400 Subject: [Bioperl-guts-l] [15714] bioperl-dev/trunk/t/data/frac_problems2.blast: This version of frac_problems2.blast has Message-ID: <200905270350.n4R3oRoU011895@dev.open-bio.org> Revision: 15714 Author: maj Date: 2009-05-26 23:50:27 -0400 (Tue, 26 May 2009) Log Message: ----------- This version of frac_problems2.blast has a small "bug fix"--here is a description: here's a weird parsing 'bug': In a blastn, one will have an hsp 100 atcg 103 || | 400 atgg 397 No problem, the query string is 'atcg' the match string is '|| |' the hit string is 'atgg' But if we have 100 atcg 103 ||| 400 atcc 397 then the query string is 'atcg' the match string is '|||' the hit string is 'atcc' That is, the match string is missing the space at the end. This may be a bug in the report generating program. To fix, should add back a space if necessary. Added Paths: ----------- bioperl-dev/trunk/t/data/frac_problems2.blast Added: bioperl-dev/trunk/t/data/frac_problems2.blast =================================================================== --- bioperl-dev/trunk/t/data/frac_problems2.blast (rev 0) +++ bioperl-dev/trunk/t/data/frac_problems2.blast 2009-05-27 03:50:27 UTC (rev 15714) @@ -0,0 +1,315 @@ +BLASTN 2.2.6 [Apr-09-2003] + + +Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, +Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), +"Gapped BLAST and PSI-BLAST: a new generation of protein database search +programs", Nucleic Acids Res. 25:3389-3402. + +Query= AEDES_02704.C + (1069 letters) + +Database: aaegypti.SCAFFOLDS-MASKED.AEDES1.fa + 4758 sequences; 1,383,971,543 total letters + +Searching..........done + + Score E +Sequences producing significant alignments: (bits) Value + +supercontig:1:supercont1.157:1:2064756:1 supercontig supercont1.157 858 0.0 + +>supercontig:1:supercont1.157:1:2064756:1 supercontig supercont1.157 + Length = 2064756 + + Score = 858 bits (433), Expect = 0.0 + Identities = 448/453 (98%) + Strand = Plus / Plus + + +Query: 12 cattttaaatgcatatattgggtgccatcatgactgcctgactcctaaacttgacctcga 71 + |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| +Sbjct: 759400 cattttaaatgcatatattgggtgccatcatgactgcctgactcctaaacttgacctcga 759459 + + +Query: 72 ggcctatattctatcccttcttacatgtagtggcttaatcctagattgctggtactcacg 131 + |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| +Sbjct: 759460 ggcctatattctatcccttcttacatgtagtggcttaatcctagattgctggtactcacg 759519 + + +Query: 132 gcaccggctattatgctcccgccgcctctagctccaccagcctctgcatcctttctgacg 191 + |||||||| |||||||||||||||||| |||||||||||||||| ||||||||||||||| +Sbjct: 759520 gcaccggccattatgctcccgccgcctttagctccaccagcctccgcatcctttctgacg 759579 + + +Query: 192 gtctgcgctcgccgtcgtcgcacgctctgtgctcctgcaccacctcagctgaagcggcta 251 + ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| +Sbjct: 759580 gtctgcgctcgccgtcgtcgcacgctctgtgctcctgcaccacctcagctgaagcggctg 759639 + + +Query: 252 gctctcgacggtcattcaccgccgacgaccttgcactactgtgttgggaacccctggtcg 311 + |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| +Sbjct: 759640 gctctcgacggtcattcaccgccgacgaccttgcactactgtgttgggaacccctggtcg 759699 + + +Query: 312 tgagtcacagtccgctcttcctccgatgtgccaaatgtcaaacgctgatatggctacgga 371 + |||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||| +Sbjct: 759700 tgagtcacagtccgctcttcctccgatgtgtcaaatgtcaaacgctgatatggctacgga 759759 + + +Query: 372 cgatcctccaccgcggtgttgggaacccctgagtgtagactcgctgggccttccttcgat 431 + |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| +Sbjct: 759760 cgatcctccaccgcggtgttgggaacccctgagtgtagactcgctgggccttccttcgat 759819 + + +Query: 432 gagccaaagaacgaaactgcaacgaaaaaaccc 464 + ||||||||||||||||||||||||||||||||| +Sbjct: 759820 gagccaaagaacgaaactgcaacgaaaaaaccc 759852 + + + + Score = 803 bits (405), Expect = 0.0 + Identities = 441/453 (97%) + Strand = Plus / Plus + + +Query: 12 cattttaaatgcatatattgggtgccatcatgactgcctgactcctaaacttgacctcga 71 + ||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||| +Sbjct: 768455 cattttaaatgcatatattgggtgccatcatgactacctgactcctaaacttgacctcga 768514 + + +Query: 72 ggcctatattctatcccttcttacatgtagtggcttaatcctagattgctggtactcacg 131 + ||||||||||||||| ||||||||||||||||||||||||||||||| |||||||||||| +Sbjct: 768515 ggcctatattctatctcttcttacatgtagtggcttaatcctagatttctggtactcacg 768574 + + +Query: 132 gcaccggctattatgctcccgccgcctctagctccaccagcctctgcatcctttctgacg 191 + |||||||| |||||||||||||||||| |||||||||||||||| |||| |||||||||| +Sbjct: 768575 gcaccggccattatgctcccgccgcctttagctccaccagcctccgcattctttctgacg 768634 + + +Query: 192 gtctgcgctcgccgtcgtcgcacgctctgtgctcctgcaccacctcagctgaagcggcta 251 + ||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||| +Sbjct: 768635 gtctgcgctcgccgtcgtcgcacgctctgtgctcctgcaccccctcagctgaagcggctg 768694 + + +Query: 252 gctctcgacggtcattcaccgccgacgaccttgcactactgtgttgggaacccctggtcg 311 + |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| +Sbjct: 768695 gctctcgacggtcattcaccgccgacgaccttgcactactgtgttgggaacccctggtcg 768754 + + +Query: 312 tgagtcacagtccgctcttcctccgatgtgccaaatgtcaaacgctgatatggctacgga 371 + ||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||| +Sbjct: 768755 tgagtcacagtccgctcttcctccgatgtgccaaatgtcaactgctgatatggctacgga 768814 + + +Query: 372 cgatcctccaccgcggtgttgggaacccctgagtgtagactcgctgggccttccttcgat 431 + |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| +Sbjct: 768815 cgatcctccaccgcggtgttgggaacccctgagtgtagactcgctgggccttccttcgat 768874 + + +Query: 432 gagccaaagaacgaaactgcaacgaaaaaaccc 464 + |||||||||||| |||||||||||||||||||| +Sbjct: 768875 gagccaaagaacaaaactgcaacgaaaaaaccc 768907 + + + + Score = 317 bits (160), Expect = 3e-84 + Identities = 170/172 (98%), Gaps = 1/172 (0%) + Strand = Plus / Plus + + +Query: 899 ttcctttcaccgtatggccacgtgacgatgatgagctctggcgcctttcgcgtccggatg 958 + |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| +Sbjct: 769407 ttcctttcaccgtatggccacgtgacgatgatgagctctggcgcctttcgcgtccggatg 769466 + + +Query: 959 ttttccgccacggccggtaacaactatgtaacctttcactatggaaaactgcaaaagttt 1018 + ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| +Sbjct: 769467 ttttccgccacggccggtaacaactatgtaacctttcactatggaaaactgcaaaagttc 769526 + + +Query: 1019 tctttcgccggcttt-aacggcactaactttttgtggcaaaatccgcttatt 1069 + ||||||||||||||| |||||||||||||||||||||||||||||||||||| +Sbjct: 769527 tctttcgccggctttaaacggcactaactttttgtggcaaaatccgcttatt 769578 + + + + Score = 311 bits (157), Expect = 2e-82 + Identities = 167/169 (98%), Gaps = 1/169 (0%) + Strand = Plus / Plus + + +Query: 902 ctttcaccgtatggccacgtgacgatgatgagctctggcgcctttcgcgtccggatgttt 961 + |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| +Sbjct: 760355 ctttcaccgtatggccacgtgacgatgatgagctctggcgcctttcgcgtccggatgttt 760414 + + +Query: 962 tccgccacggccggtaacaactatgtaacctttcactatggaaaactgcaaaagttttct 1021 + |||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||| +Sbjct: 760415 tccgccacggccggtaacaactatgtaacctttcactatggaaaactgcaaaagttctct 760474 + + +Query: 1022 ttcgccggcttt-aacggcactaactttttgtggcaaaatccgcttatt 1069 + |||||||||||| |||||||||||||||||||||||||||||||||||| +Sbjct: 760475 ttcgccggctttaaacggcactaactttttgtggcaaaatccgcttatt 760523 + + + + Score = 293 bits (148), Expect = 5e-77 + Identities = 151/152 (99%) + Strand = Plus / Plus + + +Query: 587 ccccctttctttccgtcggaatcacgatccgagcagtagcgtcagcagacaatggatttg 646 + ||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||| +Sbjct: 769138 ccccctttctttccgtcggaatcgcgatccgagcagtagcgtcagcagacaatggatttg 769197 + + +Query: 647 cggcgttcttattgttgaacaatgcgtgcccgtcctgatgccagcgatgatctcgtgatc 706 + |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| +Sbjct: 769198 cggcgttcttattgttgaacaatgcgtgcccgtcctgatgccagcgatgatctcgtgatc 769257 + + +Query: 707 ggacaatcacgtcggtttcgaagcggttggcc 738 + |||||||||||||||||||||||||||||||| +Sbjct: 769258 ggacaatcacgtcggtttcgaagcggttggcc 769289 + + + + Score = 293 bits (148), Expect = 5e-77 + Identities = 151/152 (99%) + Strand = Plus / Plus + + +Query: 587 ccccctttctttccgtcggaatcacgatccgagcagtagcgtcagcagacaatggatttg 646 + ||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||| +Sbjct: 760083 ccccctttctttccgtcggaatcgcgatccgagcagtagcgtcagcagacaatggatttg 760142 + + +Query: 647 cggcgttcttattgttgaacaatgcgtgcccgtcctgatgccagcgatgatctcgtgatc 706 + |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| +Sbjct: 760143 cggcgttcttattgttgaacaatgcgtgcccgtcctgatgccagcgatgatctcgtgatc 760202 + + +Query: 707 ggacaatcacgtcggtttcgaagcggttggcc 738 + |||||||||||||||||||||||||||||||| +Sbjct: 760203 ggacaatcacgtcggtttcgaagcggttggcc 760234 + + + + Score = 242 bits (122), Expect = 2e-61 + Identities = 125/126 (99%) + Strand = Plus / Plus + + +Query: 463 cccaataaggggtttctcgtcgtgtgtgggtgcttcgagggaagcgacagtcaacctggc 522 + |||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||| +Sbjct: 768959 cccaataaggggtttctcgtcgtgtgtgggtgcttcgagggaagcgacagccaacctggc 769018 + + +Query: 523 tcgcttcgaggtcctcctacggctcttgcagtggctatgcgctattggtggatgcaaaat 582 + |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| +Sbjct: 769019 tcgcttcgaggtcctcctacggctcttgcagtggctatgcgctattggtggatgcaaaat 769078 + + +Query: 583 cgtccc 588 + |||||| +Sbjct: 769079 cgtccc 769084 + + + + Score = 242 bits (122), Expect = 2e-61 + Identities = 125/126 (99%) + Strand = Plus / Plus + + +Query: 463 cccaataaggggtttctcgtcgtgtgtgggtgcttcgagggaagcgacagtcaacctggc 522 + |||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||| +Sbjct: 759904 cccaataaggggtttctcgtcgtgtgtgggtgcttcgagggaagcgacagccaacctggc 759963 + + +Query: 523 tcgcttcgaggtcctcctacggctcttgcagtggctatgcgctattggtggatgcaaaat 582 + |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| +Sbjct: 759964 tcgcttcgaggtcctcctacggctcttgcagtggctatgcgctattggtggatgcaaaat 760023 + + +Query: 583 cgtccc 588 + |||||| +Sbjct: 760024 cgtccc 760029 + + + + Score = 123 bits (62), Expect = 1e-25 + Identities = 65/66 (98%) + Strand = Plus / Plus + + +Query: 737 ccactctaaaatgcttcttcgtgccttctaggtcgtcgatatttgccgcgaaaaccgtga 796 + |||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||| +Sbjct: 769344 ccactctaaaatgcttcttcgtgccttctaggtcgttgatatttgccgcgaaaaccgtga 769403 + + +Query: 797 tccttc 802 + |||||| +Sbjct: 769404 tccttc 769409 + + + + Score = 121 bits (61), Expect = 4e-25 + Identities = 64/65 (98%) + Strand = Plus / Plus + + +Query: 737 ccactctaaaatgcttcttcgtgccttctaggtcgtcgatatttgccgcgaaaaccgtga 796 + |||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||| +Sbjct: 760289 ccactctaaaatgcttcttcgtgccttctaggtcgttgatatttgccgcgaaaaccgtga 760348 @@ Diff output truncated at 10000 characters. @@ From bugzilla-daemon at portal.open-bio.org Wed May 27 17:40:21 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 27 May 2009 17:40:21 -0400 Subject: [Bioperl-guts-l] [Bug 2842] New: add mask_columns to SimpleAlign.pm Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2842 Summary: add mask_columns to SimpleAlign.pm Product: BioPerl Version: 1.6 branch Platform: PC OS/Version: Linux Status: UNCONFIRMED Severity: enhancement Priority: P2 Component: Core Components AssignedTo: bioperl-guts-l at bioperl.org ReportedBy: tristan.lefebure at gmail.com Hi there, I'm hoping I'm not re-inventing the wheel. So, I was looking for a method to mask a portion of an alignment, e.g. low quality alignment regions or duplicated regions. I did not find such a thing in Bioperl, so I added a simple one in SimpleAlign.pm called mask_columns(). It's almost a copy paste of slice(), except that I was not sure what the Bio::Seq::Meta sections was used for, so I removed them. I will attach a diff file, as well as an example. (this is my first real patch, be kind!) Cheers, --Tristan -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 27 17:41:41 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 27 May 2009 17:41:41 -0400 Subject: [Bioperl-guts-l] [Bug 2842] add mask_columns to SimpleAlign.pm In-Reply-To: Message-ID: <200905272141.n4RLffPY017913@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2842 ------- Comment #1 from tristan.lefebure at gmail.com 2009-05-27 17:41 EST ------- Created an attachment (id=1306) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1306&action=view) Diff file Produced with: diff -u SimpleAlign.pm SimpleAlignMod.pm > ~/SimpleAlign_patch.diff -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 27 17:43:44 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 27 May 2009 17:43:44 -0400 Subject: [Bioperl-guts-l] [Bug 2842] add mask_columns to SimpleAlign.pm In-Reply-To: Message-ID: <200905272143.n4RLhiYB018141@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2842 ------- Comment #2 from tristan.lefebure at gmail.com 2009-05-27 17:43 EST ------- Created an attachment (id=1307) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1307&action=view) a test script A simple example running the mask_columns method. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 27 17:44:34 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 27 May 2009 17:44:34 -0400 Subject: [Bioperl-guts-l] [Bug 2842] add mask_columns to SimpleAlign.pm In-Reply-To: Message-ID: <200905272144.n4RLiYxP018221@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2842 ------- Comment #3 from tristan.lefebure at gmail.com 2009-05-27 17:44 EST ------- Created an attachment (id=1308) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1308&action=view) an accompagning example dataset -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 27 17:57:00 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 27 May 2009 17:57:00 -0400 Subject: [Bioperl-guts-l] [Bug 2842] add mask_columns to SimpleAlign.pm In-Reply-To: Message-ID: <200905272157.n4RLv0ag019604@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2842 tristan.lefebure at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |tristan.lefebure at gmail.com -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From cjfields at dev.open-bio.org Thu May 28 18:10:38 2009 From: cjfields at dev.open-bio.org (Christopher John Fields) Date: Thu, 28 May 2009 18:10:38 -0400 Subject: [Bioperl-guts-l] [15715] bioperl-live/trunk/t/Seq/LocatableSeq.t: add tests for '*' Message-ID: <200905282210.n4SMAc6K019375@dev.open-bio.org> Revision: 15715 Author: cjfields Date: 2009-05-28 18:10:37 -0400 (Thu, 28 May 2009) Log Message: ----------- add tests for '*' Modified Paths: -------------- bioperl-live/trunk/t/Seq/LocatableSeq.t Modified: bioperl-live/trunk/t/Seq/LocatableSeq.t =================================================================== --- bioperl-live/trunk/t/Seq/LocatableSeq.t 2009-05-27 03:50:27 UTC (rev 15714) +++ bioperl-live/trunk/t/Seq/LocatableSeq.t 2009-05-28 22:10:37 UTC (rev 15715) @@ -7,7 +7,7 @@ use lib '.'; use Bio::Root::Test; - test_begin(-tests => 116); + test_begin(-tests => 118); use_ok('Bio::LocatableSeq'); use_ok('Bio::AlignIO'); @@ -236,6 +236,22 @@ ok $@; like $@, qr/Overriding value \[554\] with value 552/; +lives_ok { $seq = Bio::LocatableSeq->new( + -seq => 'LSYC*', + -strand => 0, + -start => 1, + -end => 5, + -verbose => 2 + );} '* is counted in length'; + +throws_ok { $seq = Bio::LocatableSeq->new( + -seq => 'LSYC*', + -strand => 0, + -start => 1, + -end => 6, + -verbose => 2 + );} qr/Overriding value \[6\] with value 5/, '* is counted in length, but end is wrong'; + # setting symbols (class variables) - demonstrate scoping issues when using # globals with and w/o localization. To be fixed in a future BioPerl version From cjfields at dev.open-bio.org Thu May 28 20:14:53 2009 From: cjfields at dev.open-bio.org (Christopher John Fields) Date: Thu, 28 May 2009 20:14:53 -0400 Subject: [Bioperl-guts-l] [15716] bioperl-live/trunk/Bio/AlignIO/fasta.pm: * counts as a residue; use LocatableSeq's symbols for consistency Message-ID: <200905290014.n4T0Er04019591@dev.open-bio.org> Revision: 15716 Author: cjfields Date: 2009-05-28 20:14:53 -0400 (Thu, 28 May 2009) Log Message: ----------- * counts as a residue; use LocatableSeq's symbols for consistency Modified Paths: -------------- bioperl-live/trunk/Bio/AlignIO/fasta.pm Modified: bioperl-live/trunk/Bio/AlignIO/fasta.pm =================================================================== --- bioperl-live/trunk/Bio/AlignIO/fasta.pm 2009-05-28 22:10:37 UTC (rev 15715) +++ bioperl-live/trunk/Bio/AlignIO/fasta.pm 2009-05-29 00:14:53 UTC (rev 15716) @@ -62,6 +62,7 @@ use base qw(Bio::AlignIO); our $WIDTH = 60; +use Bio::LocatableSeq; =head2 next_aln @@ -228,7 +229,8 @@ sub _get_len { my ($self,$seq) = @_; - $seq =~ s/[^A-Z]//gi; + my $chars = $Bio::LocatableSeq::RESIDUE_SYMBOLS; + $seq =~ s{[^$chars]+}{}gi; return CORE::length($seq); } From bugzilla-daemon at portal.open-bio.org Fri May 29 10:14:58 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 29 May 2009 10:14:58 -0400 Subject: [Bioperl-guts-l] [Bug 2843] New: FeatureIO BED to GFF fails with no phase Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2843 Summary: FeatureIO BED to GFF fails with no phase Product: BioPerl Version: unspecified Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Core Components AssignedTo: bioperl-guts-l at bioperl.org ReportedBy: caroline.johnston at iop.kcl.ac.uk trying to BED->GFF with FeatureIO with no phase info gives: "Can't call method "value" without a package or object reference at /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm line 903, line 1." when I try to use FeatureIO to convert a BED file to a GFF v3 file. Offending line is: my $phase = $feature->phase->value; -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 29 10:15:39 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 29 May 2009 10:15:39 -0400 Subject: [Bioperl-guts-l] [Bug 2843] FeatureIO BED to GFF fails with no phase In-Reply-To: Message-ID: <200905291415.n4TEFdAO002605@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2843 ------- Comment #1 from caroline.johnston at iop.kcl.ac.uk 2009-05-29 10:15 EST ------- Possible fix (as per score line 902): my $phase = defined($feature->phase) ? (ref($feature->phase) ? $feature->phase->value : $feature->phase) : undef; -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 29 10:19:50 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 29 May 2009 10:19:50 -0400 Subject: [Bioperl-guts-l] [Bug 2843] FeatureIO BED to GFF fails with no phase In-Reply-To: Message-ID: <200905291419.n4TEJoUu002951@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2843 ------- Comment #2 from caroline.johnston at iop.kcl.ac.uk 2009-05-29 10:19 EST ------- Created an attachment (id=1309) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1309&action=view) check $feature->phase is defined -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From lstein at dev.open-bio.org Fri May 29 10:49:09 2009 From: lstein at dev.open-bio.org (Lincoln Stein) Date: Fri, 29 May 2009 10:49:09 -0400 Subject: [Bioperl-guts-l] [15717] bioperl-live/trunk/Bio/DB/SeqFeature.pm: make feature_id an alias for primary_id Message-ID: <200905291449.n4TEn9tc022401@dev.open-bio.org> Revision: 15717 Author: lstein Date: 2009-05-29 10:49:07 -0400 (Fri, 29 May 2009) Log Message: ----------- make feature_id an alias for primary_id Modified Paths: -------------- bioperl-live/trunk/Bio/DB/SeqFeature.pm Modified: bioperl-live/trunk/Bio/DB/SeqFeature.pm =================================================================== --- bioperl-live/trunk/Bio/DB/SeqFeature.pm 2009-05-29 00:14:53 UTC (rev 15716) +++ bioperl-live/trunk/Bio/DB/SeqFeature.pm 2009-05-29 14:49:07 UTC (rev 15717) @@ -387,6 +387,10 @@ # for Bio::LocationI compatibility sub location_type { return 'EXACT' } +# for Bio::DB::GFF compatibility + +sub feature_id {shift->primary_id} + 1; From bugzilla-daemon at portal.open-bio.org Fri May 29 11:27:42 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 29 May 2009 11:27:42 -0400 Subject: [Bioperl-guts-l] [Bug 2842] add mask_columns to SimpleAlign.pm In-Reply-To: Message-ID: <200905291527.n4TFRgmc008981@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2842 tristan.lefebure at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1306 is|0 |1 obsolete| | ------- Comment #4 from tristan.lefebure at gmail.com 2009-05-29 11:27 EST ------- Created an attachment (id=1310) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1310&action=view) new (this time working) patch Oops, this was a poor contribution as the first submitted patch did not do the job very well. Sorry about that, I should be more careful in the future. Now, for example, $aln->mask_columns(15,20,'?') will do the following job: Before: 3 37 seq1 AAAATGGGGG TGGT------ GGTACCT--- ------- seq2 -----GGCGG TGGTGNNNNG GGTTCCCTNN NNNNNNN new AAAATGGNGG TGGTN----N GGTNCCNTNN NNNNNNN After: 3 37 seq1 AAAATGGGGG TGGT?????? GGTACCT--- ------- seq2 -----GGCGG TGGT?????? GGTTCCCTNN NNNNNNN new AAAATGGNGG TGGT?????? GGTNCCNTNN NNNNNNN Cheers, -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From maj at dev.open-bio.org Sat May 30 20:02:22 2009 From: maj at dev.open-bio.org (Mark Allen Jensen) Date: Sat, 30 May 2009 20:02:22 -0400 Subject: [Bioperl-guts-l] [15718] bioperl-dev/trunk/t/data/1ZZ19XR301R-Alignment.tblastn: tiling test data Message-ID: <200905310002.n4V02MI0018678@dev.open-bio.org> Revision: 15718 Author: maj Date: 2009-05-30 20:02:21 -0400 (Sat, 30 May 2009) Log Message: ----------- tiling test data Added Paths: ----------- bioperl-dev/trunk/t/data/1ZZ19XR301R-Alignment.tblastn Added: bioperl-dev/trunk/t/data/1ZZ19XR301R-Alignment.tblastn =================================================================== --- bioperl-dev/trunk/t/data/1ZZ19XR301R-Alignment.tblastn (rev 0) +++ bioperl-dev/trunk/t/data/1ZZ19XR301R-Alignment.tblastn 2009-05-31 00:02:21 UTC (rev 15718) @@ -0,0 +1,38858 @@ +TBLASTN 2.2.20+ +Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro +A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and +David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new +generation of protein database search programs", Nucleic +Acids Res. 25:3389-3402. + + +RID: 1ZZ19XR301R + + +Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, +GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) + 8,818,151 sequences; 27,400,875,368 total letters +Query= gi|5032311|ref|NP_004013.1| dystrophin Dp140ab isoform [Homo sapiens] +Length=1230 + + + Score E +Sequences producing significant alignments: (Bits) Value + +ref|NM_004022.2| Homo sapiens dystrophin (DMD), transcript va... 2544 0.0 +dbj|AB208836.1| Homo sapiens mRNA for dystrophin Dp427c isofo... 2538 0.0 +ref|NM_004021.2| Homo sapiens dystrophin (DMD), transcript va... 2534 0.0 +gb|BC150141.1| Homo sapiens dystrophin, mRNA (cDNA clone IMAG... 2531 0.0 +ref|NM_004010.3| Homo sapiens dystrophin (DMD), transcript va... 2486 0.0 +ref|NM_004009.3| Homo sapiens dystrophin (DMD), transcript va... 2486 0.0 +ref|NM_004007.2| Homo sapiens dystrophin (DMD), transcript va... 2486 0.0 +ref|NM_004006.2| Homo sapiens dystrophin (DMD), transcript va... 2486 0.0 +ref|NM_000109.3| Homo sapiens dystrophin (DMD), transcript va... 2486 0.0 +gb|EU048698.1| Shuttle vector phcAd.DYS-FL, complete sequence 2486 0.0 +gb|M18533.1|HUMDYS Homo sapiens dystrophin (DMD) mRNA, comple... 2486 0.0 +emb|X14298.1| Human mRNA for dystrophin 2486 0.0 +gb|BC111836.3| Synthetic construct Homo sapiens clone IMAGE:4... 2485 0.0 +gb|BC118002.1| Synthetic construct Homo sapiens clone IMAGE:4... 2485 0.0 +gb|BC111934.1| Synthetic construct Homo sapiens clone IMAGE:4... 2485 0.0 +gb|BC111587.2| Synthetic construct Homo sapiens clone IMAGE:4... 2485 0.0 +ref|NM_004012.3| Homo sapiens dystrophin (DMD), transcript va... 2484 0.0 +ref|NM_004011.3| Homo sapiens dystrophin (DMD), transcript va... 2484 0.0 +ref|NM_004013.2| Homo sapiens dystrophin (DMD), transcript va... 2469 0.0 +gb|BC127103.1| Homo sapiens dystrophin, mRNA (cDNA clone IMAG... 2467 0.0 +ref|XR_022800.1| PREDICTED: Pan troglodytes dystrophin (DMD),... 2450 0.0 +ref|XM_850502.1| PREDICTED: Canis familiaris hypothetical pro... 2449 0.0 +emb|AJ865385.1| Sus scrofa mRNA for dystrophin variant Dp427 ... 2447 0.0 +ref|XM_001488124.2| PREDICTED: Equus caballus similar to dyst... 2443 0.0 +gb|AF070485.1| Canis familiaris dystrophin mRNA, complete cds 2429 0.0 +ref|NM_007868.5| Mus musculus dystrophin, muscular dystrophy ... 2414 0.0 +gb|M68859.1|MUSDYSA Mouse dystrophin mRNA, complete cds 2414 0.0 +ref|XM_001379309.1| PREDICTED: Monodelphis domestica similar ... 2325 0.0 +emb|X13369.1| Chicken mRNA for dystrophin (Duchenne muscular ... 2182 0.0 +ref|XM_002191629.1| PREDICTED: Taeniopygia guttata dystrophin... 2078 0.0 +ref|NM_004023.2| Homo sapiens dystrophin (DMD), transcript va... 1975 0.0 +ref|NM_004020.2| Homo sapiens dystrophin (DMD), transcript va... 1955 0.0 +ref|NM_004014.2| Homo sapiens dystrophin (DMD), transcript va... 1905 0.0 +ref|XM_001096400.1| PREDICTED: Macaca mulatta similar to dyst... 1707 0.0 +gb|AF339031.1|AF339031 Danio rerio dystrophin (dmd) mRNA, par... 1583 0.0 +gb|BC085236.1| Mus musculus dystrophin, muscular dystrophy, m... 1519 0.0 +ref|XM_419648.2| PREDICTED: Gallus gallus utrophin (homologou... 1508 0.0 +ref|XM_518782.2| PREDICTED: Pan troglodytes utrophin, transcr... 1498 0.0 +ref|XM_001172875.1| PREDICTED: Pan troglodytes utrophin, tran... 1498 0.0 +ref|XM_001172869.1| PREDICTED: Pan troglodytes utrophin, tran... 1498 0.0 +ref|XM_001506948.1| PREDICTED: Ornithorhynchus anatinus simil... 1495 0.0 +gb|M37645.1|FSCDYSTRO Torpedo californica dystrophin mRNA, 3'... 1493 0.0 +ref|XR_054516.1| PREDICTED: Taeniopygia guttata misc_RNA (LOC... 1491 0.0 +ref|XM_001380994.1| PREDICTED: Monodelphis domestica similar ... 1481 0.0 +ref|NM_007124.2| Homo sapiens utrophin (UTRN), mRNA 1470 0.0 +emb|X69086.1| H.sapiens mRNA for utrophin 1470 0.0 +ref|XM_001788161.1| PREDICTED: Bos taurus similar to putative... 1467 0.0 +emb|AJ002967.1| Rattus norvegicus mRNA for utrophin 1467 0.0 +ref|XR_010548.1| PREDICTED: Macaca mulatta utrophin (UTRN), mRNA 1459 0.0 +gb|AY095485.1| Canis familiaris utrophin mRNA, complete cds 1457 0.0 +ref|XM_001915985.1| PREDICTED: Equus caballus similar to utro... 1451 0.0 +ref|NM_011682.4| Mus musculus utrophin (Utrn), mRNA 1438 0.0 +emb|Y12229.1| M.musculus mRNA for utrophin 1438 0.0 +gb|M92650.1|HUMDMDXX Human Duchenne muscular dystrophy (DMD) ... 1299 0.0 +ref|NM_004018.2| Homo sapiens dystrophin (DMD), transcript va... 1298 0.0 +emb|CR859102.1| Pongo abelii mRNA; cDNA DKFZp459C1629 (from c... 1295 0.0 +ref|NM_001005244.1| Rattus norvegicus dystrophin, muscular dy... 1294 0.0 +gb|AY326948.1| Rattus norvegicus dystrophin Dp71ab (Dmd) mRNA... 1290 0.0 +gb|BC094758.1| Homo sapiens dystrophin, mRNA (cDNA clone IMAG... 1287 0.0 +gb|BC028720.1| Homo sapiens dystrophin, mRNA (cDNA clone IMAG... 1287 0.0 +ref|NM_004016.2| Homo sapiens dystrophin (DMD), transcript va... 1286 0.0 +emb|CR858847.1| Pongo abelii mRNA; cDNA DKFZp469A0710 (from c... 1266 0.0 +emb|X83506.1| M.musculus mRNA for G-utrophin 1237 0.0 +emb|CR848277.2| Xenopus tropicalis finished cDNA, clone TNeu1... 1236 0.0 +gb|BC080941.1| Xenopus tropicalis dystrophin, mRNA (cDNA clon... 1236 0.0 +ref|NM_004017.2| Homo sapiens dystrophin (DMD), transcript va... 1235 0.0 +ref|NM_012698.2| Rattus norvegicus dystrophin, muscular dystr... 1234 0.0 +gb|BC149235.1| Bos taurus dystrophin, mRNA (cDNA clone MGC:15... 1230 0.0 +gb|BC070078.1| Homo sapiens dystrophin, mRNA (cDNA clone IMAG... 1230 0.0 +gb|AY326947.1| Rattus norvegicus dystrophin Dp71a (Dmd) mRNA,... 1229 0.0 +ref|NM_004015.2| Homo sapiens dystrophin (DMD), transcript va... 1224 0.0 +dbj|AK036936.1| Mus musculus adult female vagina cDNA, RIKEN ... 1223 0.0 +gb|BC082429.1| Xenopus laevis dystrophin, mRNA (cDNA clone MG... 1220 0.0 +gb|BC136240.1| Xenopus tropicalis cDNA clone IMAGE:7661888, c... 1142 0.0 +dbj|AK159639.1| Mus musculus osteoclast-like cell cDNA, RIKEN... 1104 0.0 +emb|X99702.1| S.caniculua mRNA for dystrophin 1100 0.0 +emb|X99700.1| X.laevis mRNA for dystrophin 1100 0.0 +ref|XM_002227549.1| Branchiostoma floridae hypothetical prote... 1096 0.0 +gb|AF304204.1|AF304204 Strongylocentrotus purpuratus dystroph... 1093 0.0 +ref|XM_001918785.1| PREDICTED: Danio rerio hypothetical LOC79... 1092 0.0 +gb|BC095190.1| Danio rerio dystrophin, mRNA (cDNA clone MGC:1... 1091 0.0 +ref|XM_001060977.1| PREDICTED: Rattus norvegicus similar to d... 1081 0.0 +gb|U43517.2|SCU43517 Scyliorhinus canicula dystrophin-related... 1078 0.0 +ref|XM_002187321.1| PREDICTED: Taeniopygia guttata similar to... 1025 0.0 +gb|U43519.1|HSU43519 Human dystrophin-related protein 2 (DRP2... 1005 0.0 +ref|XM_001493019.2| PREDICTED: Equus caballus dystrophin rela... 1003 0.0 +dbj|AK295843.1| Homo sapiens cDNA FLJ52301 complete cds, high... 1002 0.0 +gb|BC162218.1| Danio rerio dystrophin related protein 2, mRNA... 1002 0.0 +ref|XM_001092374.1| PREDICTED: Macaca mulatta dystrophin rela... 1001 0.0 +ref|NM_001939.2| Homo sapiens dystrophin related protein 2 (D... 1001 0.0 +gb|BC111695.1| Homo sapiens dystrophin related protein 2, mRN... 1001 0.0 +gb|DQ443728.1| Danio rerio dystrophin-related protein 2 (DRP2... 1001 0.0 +dbj|AB384915.1| Synthetic construct DNA, clone: pF1KB4196, Ho... 1000 0.0 +dbj|AK289825.1| Homo sapiens cDNA FLJ75585 complete cds, high... 999 0.0 +gb|AF195788.1|AF195788 Rattus norvegicus dystrophin-related p... 996 0.0 +gb|AF195787.1|AF195787 Rattus norvegicus dystrophin-related p... 996 0.0 +ref|XM_538105.2| PREDICTED: Canis familiaris similar to Dystr... 994 0.0 +dbj|AK158102.1| Mus musculus adult inner ear cDNA, RIKEN full... 993 0.0 +ref|NM_010078.3| Mus musculus dystrophin related protein 2 (D... 991 0.0 +gb|BC125347.1| Mus musculus dystrophin related protein 2, mRN... 991 0.0 +gb|BC125345.1| Mus musculus dystrophin related protein 2, mRN... 991 0.0 +ref|XM_001364344.1| PREDICTED: Monodelphis domestica similar ... 991 0.0 +ref|XM_617584.4| PREDICTED: Bos taurus similar to dystrophin ... 991 0.0 +dbj|AK081426.1| Mus musculus 16 days embryo head cDNA, RIKEN ... 989 0.0 +tpg|BK005803.1| TPA: TPA_inf: Ornithorhynchus anatinus dystro... 981 0.0 +emb|AJ223356.1| Strongylocentrotus purpuratus mRNA for SuDp98... 973 0.0 +ref|XR_023213.1| PREDICTED: Pan troglodytes similar to dystro... 953 0.0 +gb|BC136034.1| Xenopus tropicalis dystrophin related protein ... 929 0.0 @@ Diff output truncated at 10000 characters. @@ From maj at dev.open-bio.org Sat May 30 20:06:18 2009 From: maj at dev.open-bio.org (Mark Allen Jensen) Date: Sat, 30 May 2009 20:06:18 -0400 Subject: [Bioperl-guts-l] [15719] bioperl-dev/trunk/t/SearchIO/Tiling.t: exhaustive (or exhausting, anyway) tests Message-ID: <200905310006.n4V06I1N018896@dev.open-bio.org> Revision: 15719 Author: maj Date: 2009-05-30 20:06:18 -0400 (Sat, 30 May 2009) Log Message: ----------- exhaustive (or exhausting, anyway) tests Modified Paths: -------------- bioperl-dev/trunk/t/SearchIO/Tiling.t Modified: bioperl-dev/trunk/t/SearchIO/Tiling.t =================================================================== --- bioperl-dev/trunk/t/SearchIO/Tiling.t 2009-05-31 00:02:21 UTC (rev 15718) +++ bioperl-dev/trunk/t/SearchIO/Tiling.t 2009-05-31 00:06:18 UTC (rev 15719) @@ -5,7 +5,7 @@ use lib '.'; use lib '../..'; use Bio::Root::Test; - test_begin(-tests => 1000 ); + test_begin(-tests => 6488 ); } use_ok('Bio::Search::Tiling::MapTiling'); @@ -16,16 +16,96 @@ chdir('../..'); -ok( my $parser = new Bio::SearchIO( +my ($blio, $result, $hit, $tiling, $hsp); +my @normal_formats = qw( blast wublast + blastn wublastn + blastp wublastp + multiblast + megablast + rpsblast + psiblast ); +my @xltd_formats = qw( blastx wublastx + tblastn wutblastn + tblastx wutblastx ); + + +my %test_files = ( + 'blast' => [qw( + ecolitst.bls + ecolitst.bls + frac_problems.blast + frac_problems2.blast + frac_problems3.blast + bl2seq.out + )], + 'multiblast' => [qw( + multi_blast.bls + )], + 'blastn' => [qw( + a_thaliana.blastn + bl2seq.blastn + new_blastn.txt + hsinsulin.blastcl3.blastn + )], + 'wublastn' =>[qw( + brassica_ATH.WUBLASTN + echofilter.wublastn + )], + 'blastp' => [qw( + blastp2215.blast + no_hsps.blastp + catalase-webblast.BLASTP + )], + 'wublastp' => [qw( + dcr1_sp.WUBLASTP + ecolitst.wublastp + contig-by-hand.wublastp + ecolitst.noseqs.wublastp + )], + 'blastx' => [qw( + bl2seq.blastx.out + )], + 'wublastx' => [qw( + dnaEbsub_ecoli.wublastx + )], + 'wublast' => [qw( + tricky.wublast + )], + 'tblastn' => [qw( + tblastn.out + 1ZZ19XR301R-Alignment.tblastn + )], + 'wutblastn' => [qw( + dnaEbsub_ecoli.wutblastn + )], + 'tblastx' => [qw( + bl2seq.tblastx.out + HUMBETGLOA.tblastx + )], + 'wutblastx' => [qw( + dnaEbsub_ecoli.wutblastx + )], + 'megablast' => [qw( + 503384.MEGABLAST.2 + )], + 'rpsblast' => [qw( + ecoli_domains.rpsblast + )], + 'psiblast' => [qw( + psiblastreport.out + )] + ); + +ok( $blio = new Bio::SearchIO( -file=>test_input_file('dcr1_sp.WUBLASTP'), -format=>'blast'), 'parse data file'); -my $result = $parser->next_result; +$result = $blio->next_result; while ( $_ = $result->next_hit ) { last if $_->name =~ /ASPTN/; } -ok(my $test_hit = $_, 'got test hit'); -ok(my $tiling = Bio::Search::Tiling::MapTiling->new($test_hit), 'create tiling'); +ok($hit = $_, 'got test hit'); +ok($tiling = Bio::Search::Tiling::MapTiling->new($hit), 'create tiling'); # TilingI compliance @@ -77,75 +157,245 @@ # @filters = ($qstrand, $hstrand, $qframe, $hframe) -my %examples = ( - 'BLASTN' => ['blast', 'AE003528_ecoli.bls', - [1,-1, undef, undef], - [1,-1, 1, 1]], - 'BLASTP' => ['blast', 'catalase-webblast.BLASTP', - [undef, undef, undef, undef], - [1, undef, undef, undef]], - 'BLASTX' => ['blast', 'dnaEbsub_ecoli.wublastx', - [1, undef, undef, undef], - [undef, 1, undef, 1]], - 'TBLASTN'=> ['blast', 'dnaEbsub_ecoli.wutblastn', - [undef, 1, undef, 1], - [1, undef, 1, undef]], - 'TBLASTX'=> ['blast', 'dnaEbsub_ecoli.wutblastx', - [1, 1, 0, 1], - [1, -2, 3, 3]], - 'FASTA' => ['fasta', 'cysprot_vs_gadfly.FASTA', - [undef, undef, undef, undef], - [1, undef, undef, undef]], - 'FASTXY' => ['fasta', '5X_1895.FASTXY', - [1, undef, undef, undef], - [undef, 1, undef, 1]], - 'MEGABLAST' => ['blast', '503384.MEGABLAST.2', - [1,-1, undef, undef], - [1,-1, 1, 1]], - 'TFASTA' => undef, - 'TFASTX' => undef - ); +# my %examples = ( +# 'BLASTN' => ['blast', 'AE003528_ecoli.bls'], +# 'BLASTP' => ['blast', 'catalase-webblast.BLASTP'], +# 'BLASTX' => ['blast', 'dnaEbsub_ecoli.wublastx'], +# 'TBLASTN'=> ['blast', 'dnaEbsub_ecoli.wutblastn'], +# 'TBLASTX'=> ['blast', 'dnaEbsub_ecoli.wutblastx'], +# 'FASTA' => ['fasta', 'cysprot_vs_gadfly.FASTA'], +# 'FASTXY' => ['fasta', '5X_1895.FASTXY'], +# 'MEGABLAST' => ['blast', '503384.MEGABLAST.2'], +# 'TFASTA' => undef, +# 'TFASTX' => undef +# ); -my %results; +# my %results; -foreach (keys %examples) { - next unless $examples{$_}; - ok( my $blio = Bio::SearchIO->new( -format=>$examples{$_}[0], - -file =>test_input_file($examples{$_}[1])), - "$_ data file"); - my $hit = ($results{$_} = $blio->next_result)->next_hit; - ok( $tiling = Bio::Search::Tiling::MapTiling->new($hit, @{$examples{$_}[2]}), "tiling object created for $_ hit"); - dies_ok { Bio::Search::Tiling::MapTiling->new($hit, @{$examples{$_}[3]}) } "tiling object arg exception check for $_ hit"; - 1; -} +# foreach (keys %examples) { +# next unless $examples{$_}; +# ok( $blio = Bio::SearchIO->new( -format=>$examples{$_}[0], +# -file =>test_input_file($examples{$_}[1])), +# "$_ data file"); +# my $hit = ($results{$_} = $blio->next_result)->next_hit; +# } -# tricky wu-blast -ok (my $blio = Bio::SearchIO->new( -format=>'blast', - -file=>test_input_file('tricky.wublast')), - 'tricky.wublast') -ok( $tiling = Bio::Search::Tiling::MapTiling->new($blio->next_result->next_hit), 'tricky tiling'); -my @map = $tiling->coverage_map_as_text('query',1); - at map = $tiling->coverage_map_as_text('hit',1); +diag("Old blast.t tiling tests"); -ok (my $blio = Bio::SearchIO->new( -format=>'blast', - -file=>test_input_file('frac_problems.blast')), - 'frac_problems.blast') -ok( $tiling = Bio::Search::Tiling::MapTiling->new($blio->next_result->next_hit), 'frac_problems tiling'); +ok($blio = Bio::SearchIO->new( + '-format' => 'blast', + '-file' => test_input_file('ecolitst.wublastp') + ), "ecolitst.wublastp"); +$result = $blio->next_result; +$result->next_hit; +$hit = $result->next_hit; +$tiling = Bio::Search::Tiling::MapTiling->new($hit); +# Test HSP contig data returned by SearchUtils::tile_hsps() +# Second hit has two hsps that overlap. -ok (my $blio = Bio::SearchIO->new( -format=>'blast', - -file=>test_input_file('frac_problems.blast')), - 'frac_problems2.blast') -ok( $tiling = Bio::Search::Tiling::MapTiling->new($blio->next_result->next_hit), 'frac_problems2 tiling'); +# compare with the contig made by hand for these two contigs +# in t/data/contig-by-hand.wublastp +# (in this made-up file, the hsps from ecolitst.wublastp +# were aligned and contiged, and Length, Identities, Positives +# were counted, by a human (maj) ) + +my $hand_hit = Bio::SearchIO->new( + -format=>'blast', + -file=>test_input_file('contig-by-hand.wublastp') + )->next_result->next_hit; +my $hand_hsp = $hand_hit->next_hsp; +my @hand_qrng = $hand_hsp->range('query'); +my @hand_srng = $hand_hsp->range('hit'); +my @hand_matches = $hand_hit->matches; -ok (my $blio = Bio::SearchIO->new( -format=>'blast', - -file=>test_input_file('frac_problems.blast')), - 'frac_problems3.blast') -ok( $tiling = Bio::Search::Tiling::MapTiling->new($blio->next_result->next_hit), 'frac_problems3 tiling'); +is(($tiling->range('query'))[0], $hand_qrng[0]); +is(($tiling->range('query'))[1], $hand_qrng[1]); +is(sprintf("%d",$tiling->identities('query')), $hand_matches[0]); +is(sprintf("%d",$tiling->conserved('query')), $hand_matches[1]); +is(($tiling->range('hit'))[0], $hand_srng[0]); +is(($tiling->range('hit'))[1], $hand_srng[1]); +is(sprintf("%d",$tiling->identities('hit')), $hand_matches[0]); +is(sprintf("%d",$tiling->conserved('hit')), $hand_matches[1]); -# old blast.t tiling tests +ok( $blio = Bio::SearchIO->new( + '-format' => 'blast', + '-file' => test_input_file('dnaEbsub_ecoli.wublastx') + ), "dnaEbsub_ecoli.wublastx"); +$hit = $blio->next_result->next_hit; +my $tiling = Bio::Search::Tiling::MapTiling->new($hit); +is(sprintf("%.3f",$tiling->frac_identical(-type=>'query',-denom=>'aligned',-context=>'p2')), '0.364'); +is(sprintf("%.3f",$tiling->frac_identical(-type=>'hit',-denom=>'aligned',-context=>'all')), '0.366'); +is(sprintf("%.3f",$tiling->frac_conserved(-type=>'query',-denom=>'aligned',-context=>'p2')), '0.537'); +is(sprintf("%.3f",$tiling->frac_conserved(-type=>'hit',-denom=>'aligned',-context=>'all')), '0.540'); +is(sprintf("%.2f",$tiling->frac_aligned_query(-context=>'p2')), '0.62'); +is(sprintf("%.2f",$tiling->frac_aligned_hit(-context=>'all')), '0.71'); -1; +ok( $blio = Bio::SearchIO->new( + '-format' => 'blast', + '-file' => test_input_file('tricky.wublast') + ), "tricky.wublast"); +$hit = $blio->next_result->next_hit; +$tiling = Bio::Search::Tiling::MapTiling->new($hit); +cmp_ok sprintf("%.3f",$tiling->frac_identical(-denom => 'aligned')), '>', 0.2, 'tricky.wublast(1)'; +cmp_ok sprintf("%.3f",$tiling->frac_conserved(-denom => 'aligned')), '<=', 1, 'tricky.wublast(2)'; +is(sprintf("%.2f",$tiling->frac_aligned_query), '0.92', 'tricky.wublast(3)'); +is(sprintf("%.2f",$tiling->frac_aligned_hit), '0.91','tricky.wublast(4)'); + +diag("New tiling tests"); + +foreach my $alg (@normal_formats) { + diag("*******$alg files*******"); + foreach my $tf (@{$test_files{$alg}}) { + ok( $blio = Bio::SearchIO->new( -format=>'blast', + -file=>test_input_file($tf) + ), "$tf" ); + $result = $blio->next_result; + my $hit_count = 0; + # compare the per-aligned-base identity avg over hsps + # with frac_identical (bzw, conserved) + + HIT: + while ( $hit = $result->next_hit ) { + ++$hit_count; @@ Diff output truncated at 10000 characters. @@ From maj at dev.open-bio.org Sat May 30 20:07:20 2009 From: maj at dev.open-bio.org (Mark Allen Jensen) Date: Sat, 30 May 2009 20:07:20 -0400 Subject: [Bioperl-guts-l] [15720] bioperl-dev/trunk/t/SearchIO/Tiling.t: decruft Message-ID: <200905310007.n4V07K8N018965@dev.open-bio.org> Revision: 15720 Author: maj Date: 2009-05-30 20:07:19 -0400 (Sat, 30 May 2009) Log Message: ----------- decruft Modified Paths: -------------- bioperl-dev/trunk/t/SearchIO/Tiling.t Modified: bioperl-dev/trunk/t/SearchIO/Tiling.t =================================================================== --- bioperl-dev/trunk/t/SearchIO/Tiling.t 2009-05-31 00:06:18 UTC (rev 15719) +++ bioperl-dev/trunk/t/SearchIO/Tiling.t 2009-05-31 00:07:19 UTC (rev 15720) @@ -14,8 +14,6 @@ use_ok('Bio::Search::Hit::BlastHit'); use_ok('File::Spec'); -chdir('../..'); - my ($blio, $result, $hit, $tiling, $hsp); my @normal_formats = qw( blast wublast blastn wublastn From maj at dev.open-bio.org Sat May 30 20:19:27 2009 From: maj at dev.open-bio.org (Mark Allen Jensen) Date: Sat, 30 May 2009 20:19:27 -0400 Subject: [Bioperl-guts-l] [15721] bioperl-dev/trunk/Bio/Search/Tiling/MapTileUtils.pm: Refactoring context handling Message-ID: <200905310019.n4V0JRWI019440@dev.open-bio.org> Revision: 15721 Author: maj Date: 2009-05-30 20:19:27 -0400 (Sat, 30 May 2009) Log Message: ----------- Refactoring context handling Modified Paths: -------------- bioperl-dev/trunk/Bio/Search/Tiling/MapTileUtils.pm Modified: bioperl-dev/trunk/Bio/Search/Tiling/MapTileUtils.pm =================================================================== --- bioperl-dev/trunk/Bio/Search/Tiling/MapTileUtils.pm 2009-05-31 00:07:19 UTC (rev 15720) +++ bioperl-dev/trunk/Bio/Search/Tiling/MapTileUtils.pm 2009-05-31 00:19:27 UTC (rev 15721) @@ -140,6 +140,9 @@ } push @flat, $a; } + if ($flat[-1]-$flat[-2]==1 and @flat % 2) { + push @flat, $flat[-1]; + } # component intervals are consecutive pairs my @decomp; while (my $a = shift @flat) { @@ -254,7 +257,7 @@ 'mapping' => [1, 3]}, 'TX' => { 'q' => qr/[sf]/, 'h' => qr/[sf]/, - 'mapping' => [3, 3]}, # correct? + 'mapping' => [3, 3]}, 'TY' => { 'q' => qr/[sf]/, 'h' => qr/[sf]/, 'mapping' => [3, 3]} @@ -285,6 +288,11 @@ return ''; } $type = 'h' if $type eq 's'; + my $alg = $hit->algorithm; + + # pretreat (i.e., kludge it) + $alg =~ /^RPS/ and ($alg) = ($alg =~ /\(([^)]+)\)/); + for ($hit->algorithm) { /MEGABLAST/i && do { return qr/[s]/; @@ -317,6 +325,9 @@ sub _set_mapping { my $self = shift; my $alg = $self->hit->algorithm; + + # pretreat (i.e., kludge it) + $alg =~ /^RPS/ and ($alg) = ($alg =~ /\(([^)]+)\)/); for ($alg) { /MEGABLAST/i && do { @@ -356,8 +367,12 @@ return undef; } $type = 'hit' if $type eq 'subject'; + my $alg = $obj->algorithm; + + # pretreat (i.e., kludge it) + $alg =~ /^RPS/ and ($alg) = ($alg =~ /\(([^)]+)\)/); - for ($obj->algorithm) { + for ($alg) { /MEGABLAST/i && do { return 1; }; @@ -402,7 +417,7 @@ my( $self, @args ) = @_; my($type, $action, $beg, $end) = $self->_rearrange( [qw(TYPE ACTION START END)], @args); my @actions = qw( identities conserved searchutils ); - + # prep $type $self->throw("Type not specified") if !defined $type; $self->throw("Type '$type' unrecognized") unless grep(/^$type$/,qw(query hit subject)); @@ -412,43 +427,31 @@ $self->throw("Action not specified") if !defined $action; $self->throw("Action '$action' unrecognized") unless grep(/^$action$/, @actions); - if ( (!defined($beg) && !defined($end)) ) { + my ($len_id, $len_cons); + my $c = Bio::Search::Tiling::MapTileUtils::_mapping_coeff($self, $type); + if ((defined $beg && !defined $end) || (!defined $beg && defined $end)) { + $self->throw("Both start and end are required"); + } + elsif ( (!defined($beg) && !defined($end)) || !$self->seq_str('match') ) { ## Get data for the whole alignment. + # the reported values x mapping + $self->debug("Sequence data not present in report; returning data for entire HSP") unless $self->seq_str('match'); + ($len_id, $len_cons) = map { $c*$_ } ($self->num_identical, $self->num_conserved); for ($action) { $_ eq 'identities' && do { - return $self->num_identical; + return $len_id; }; $_ eq 'conserved' && do { - return $self->num_conserved; + return $len_cons; }; $_ eq 'searchutils' && do { - return ($self->num_identical, $self->num_conserved); + return ($len_id, $len_cons); }; do { $self->throw("What are YOU doing here?"); }; } } - elsif (!$self->seq_str('match')) { - $self->warn("Sequence data not present in report; returning data for entire HSP"); - for ($action) { - $_ eq 'identities' && do { - return $self->num_identical; - }; - $_ eq 'conserved' && do { - return $self->num_conserved; - }; - $_ eq 'searchutils' && do { - return ($self->num_identical, $self->num_conserved); - }; - do { - $self->throw("What are YOU doing here?"); - }; - } - } - elsif ((defined $beg && !defined $end) || (!defined $beg && defined $end)) { - $self->throw("Both start and end are required"); - } else { ## Get the substring representing the desired sub-section of aln. my($start,$stop) = $self->range($type); @@ -476,7 +479,7 @@ my $seq = ""; $seq = substr( $match_str, int( ($beg-$start)/Bio::Search::Tiling::MapTileUtils::_mapping_coeff($self, $type) ), - int( ($end-$beg+1)/Bio::Search::Tiling::MapTileUtils::_mapping_coeff($self, $type) ) + int( 1+($end-$beg)/Bio::Search::Tiling::MapTileUtils::_mapping_coeff($self, $type) ) ); if(!CORE::length $seq) { @@ -484,9 +487,9 @@ } $seq =~ s/ //g; # remove space (no info). - my $len_cons = (CORE::length $seq)*(Bio::Search::Tiling::MapTileUtils::_mapping_coeff($self,$type)); + $len_cons = (CORE::length $seq)*(Bio::Search::Tiling::MapTileUtils::_mapping_coeff($self,$type)); $seq =~ s/\+//g; # remove '+' characters (conservative substitutions) - my $len_id = (CORE::length $seq)*(Bio::Search::Tiling::MapTileUtils::_mapping_coeff($self,$type)); + $len_id = (CORE::length $seq)*(Bio::Search::Tiling::MapTileUtils::_mapping_coeff($self,$type)); for ($action) { $_ eq 'identities' && do { return $len_id; @@ -504,5 +507,33 @@ } } +package Bio::Search::Tiling::MapTileUtils; + +sub ints_as_text { + my $ints = shift; + my @ints = @$ints; + my %pos; + for (@ints) { + $pos{$$_[0]}++; + $pos{$$_[1]}++; + } + + my @pos = sort {$a<=>$b} keys %pos; + @pos = map {sprintf("%03d",$_)} @pos; +#header + my $max=0; + $max = (length > $max) ? length : $max for (@pos); + for my $j (0..$max-1) { + my $i = $max-1-$j; + my @line = map { substr($_, $j, 1) || '0' } @pos; + print join('', @line), "\n"; + } + print '-' x @pos, "\n"; + undef %pos; + @pos{map {sprintf("%d",$_)} @pos} = (0.. at pos); + foreach (@ints) { + print ' ' x $pos{$$_[0]}, '[', ' ' x ($pos{$$_[1]}-$pos{$$_[0]}-1), ']', ' ' x (@pos-$pos{$$_[1]}), "\n"; + } +} 1; From maj at dev.open-bio.org Sat May 30 20:20:10 2009 From: maj at dev.open-bio.org (Mark Allen Jensen) Date: Sat, 30 May 2009 20:20:10 -0400 Subject: [Bioperl-guts-l] [15722] bioperl-dev/trunk/Bio/Search/Tiling/TilingI.pm: add algorithm accessor to interface Message-ID: <200905310020.n4V0KAJA019497@dev.open-bio.org> Revision: 15722 Author: maj Date: 2009-05-30 20:20:10 -0400 (Sat, 30 May 2009) Log Message: ----------- add algorithm accessor to interface Modified Paths: -------------- bioperl-dev/trunk/Bio/Search/Tiling/TilingI.pm Modified: bioperl-dev/trunk/Bio/Search/Tiling/TilingI.pm =================================================================== --- bioperl-dev/trunk/Bio/Search/Tiling/TilingI.pm 2009-05-31 00:19:27 UTC (rev 15721) +++ bioperl-dev/trunk/Bio/Search/Tiling/TilingI.pm 2009-05-31 00:20:10 UTC (rev 15722) @@ -366,4 +366,22 @@ #alias sub rewind { shift->rewind_tilings(@_) } +=head2 INFORMATIONAL ACCESSORS + +=head2 algorithm + + Title : algorithm + Usage : $tiling->algorithm + Function: Retrieve the algorithm name associated with the + invocant's hit object + Returns : scalar string + Args : + +=cut + +sub algorithm{ + my ($self, @args) = @_; + $self->throw_not_implemented; +} + 1; From maj at dev.open-bio.org Sat May 30 20:21:22 2009 From: maj at dev.open-bio.org (Mark Allen Jensen) Date: Sat, 30 May 2009 20:21:22 -0400 Subject: [Bioperl-guts-l] [15723] bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm: Refactor of strand/frame context handling; correct length conversions now in _calc_coverage_map Message-ID: <200905310021.n4V0LMHA019575@dev.open-bio.org> Revision: 15723 Author: maj Date: 2009-05-30 20:21:22 -0400 (Sat, 30 May 2009) Log Message: ----------- Refactor of strand/frame context handling; correct length conversions now in _calc_coverage_map Modified Paths: -------------- bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm Modified: bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm =================================================================== --- bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm 2009-05-31 00:20:10 UTC (rev 15722) +++ bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm 2009-05-31 00:21:22 UTC (rev 15723) @@ -54,6 +54,30 @@ appropriately-named public object attributes. See L for more info on the algorithm. +=head2 STRAND/FRAME CONTEXTS + +In BLASTX, TBLASTN, and TBLASTX reports, strand and frame information +are reported for the query, subject, or query and subject, +respectively, for each HSP. Tilings for these sequence types are only +meaningful when they include HSPs in the same strand and frame, or +"context". So, in these situations, the context must be specified +in the method calls or the methods will throw. + +Contexts are specified as strings: C<[ 'all' | [m|p][_|0|1|2] ]>, where +C = all HSPs (will throw if context must be specified), C = minus +strand, C

= plus strand, and C<_> = no frame info, C<0,1,2> = respective +(absolute) frame. The L<_context()> method will convert a (strand, +frame) specification to a context string, e.g.: + + $context = $self->_context(-strand=>-1, -frame=>-2); + +returns C. + +The contexts present among the HSPs in a hit are identified and stored +for convenience upon object construction. These are accessed off the +object with the L method. If contexts don't apply for the +given report, this returns C<('all')>. + =head1 DESIGN NOTE The major calculations are made just-in-time, and then memoized. So, @@ -112,7 +136,7 @@ use warnings; # Object preamble - inherits from Bio::Root::Root -use lib '../../..'; +#use lib '../../..'; use Bio::Root::Root; use Bio::Search::Tiling::TilingI; @@ -130,44 +154,59 @@ Function: Builds a new Bio::Search::Tiling::GenericTiling object Returns : an instance of Bio::Search::Tiling::GenericTiling Args : -hit => $a_Bio_Search_Hit_HitI_object - filtering args for nucleotide data: - -qstrand => [[ 1 | -1 ]] - -hstrand => [[ 1 | -1 ]] - -qframe => [[ -2 | -1 | 0 | 1 | 2 ]] - -hframe => [[ -2 | -1 | 0 | 1 | 2 ]] - Note : Not all filters are valid for all BLAST/FAST - algorithms. The constructor will warn when, - e.g., -qstrand is set for BLASTP data. - + general filter function: + -hsp_filter => sub { my $this_hsp = shift; + ...; + return 1 if $wanted; + return 0; } =cut sub new { my $class = shift; my @args = @_; - my $self = $class->SUPER::new; - my($hit, $qstrand, $hstrand, $qframe, $hframe) = $self->_rearrange( [qw( HIT QSTRAND HSTRAND QFRAME HFRAME )], at args ); + my $self = $class->SUPER::new(@args); + my($hit, $filter) = $self->_rearrange( [qw( HIT HSP_FILTER)], at args ); $self->throw("HitI object required") unless $hit; $self->throw("Argument must be HitI object") unless ( ref $hit && $hit->isa('Bio::Search::Hit::HitI') ); $self->{hit} = $hit; + $self->_set_mapping(); + $self->{"_algorithm"} = $hit->algorithm; my @hsps; - $self->_check_new_args($qstrand, $hstrand, $qframe, $hframe); - # filter if requested - while (local $_ = $hit->next_hsp) { - push @hsps, $_ if ( ( !$qstrand || ($qstrand == $_->strand('query'))) && - ( !$hstrand || ($hstrand == $_->strand('hit')) ) && - ( !defined $qframe || ($qframe == $_->frame('query')) ) && - ( !defined $hframe || ($hframe == $_->frame('hit')) ) ); + # apply filter function if requested + if ( defined $filter ) { + if ( ref($filter) eq 'CODE' ) { + @hsps = map { $filter->($_) ? $_ : () } @hsps; + } + else { + $self->warn("-filter is not a coderef; ignoring"); + } } + else { + @hsps = $hit->hsps; + } + + # identify available contexts + for my $t qw( query hit ) { + my %contexts; + if ($self->_has_logical_length($t)) { + for my $i (0..$#hsps) { + my $ctxt = $self->_context(-strand => $hsps[$i]->strand($t), + -frame => $hsps[$i]->frame($t)); + $contexts{$ctxt} ||= []; + push @{$contexts{$ctxt}}, $i; + } + } + else { + $contexts{'all'} = [(0..$#hsps)]; + } + $self->{"_contexts_${t}"} = \%contexts; + } + $self->warn("No HSPs present in hit after filtering") unless (@hsps); $self->hsps(\@hsps); - $self->_set_mapping(); - $self->{"strand_query"} = $qstrand; - $self->{"strand_hit"} = $hstrand; - $self->{"frame_query"} = $qframe; - $self->{"strand_hit"} = $hframe; return $self; } @@ -220,26 +259,29 @@ =head2 identities Title : identities - Usage : $tiling->identities($type, $action) + Usage : $tiling->identities($type, $action, $context) Function: Retrieve the calculated number of identities for the invocant Example : Returns : value of identities (a scalar) Args : scalar $type: one of 'hit', 'subject', 'query' default is 'query' - option scalar $action: one of 'exact', 'est', 'max' + option scalar $action: one of 'exact', 'est', 'fast', 'max' default is 'exact' + option scalar $context: strand/frame context string Note : getter only + =cut sub identities{ my $self = shift; - my ($type, $action) = @_; + my ($type, $action, $context) = @_; $self->_check_type_arg(\$type); $self->_check_action_arg(\$action); - if (!defined $self->{"identities_${type}_${action}"}) { - $self->_calc_stats($type, $action); + $self->_check_context_arg($type, \$context); + if (!defined $self->{"identities_${type}_${action}_${context}"}) { + $self->_calc_stats($type, $action, $context); } - return $self->{"identities_${type}_${action}"}; + return $self->{"identities_${type}_${action}_${context}"}; } =head2 conserved @@ -251,100 +293,115 @@ Returns : value of conserved (a scalar) Args : scalar $type: one of 'hit', 'subject', 'query' default is 'query' - option scalar $action: one of 'exact', 'est', 'max' + option scalar $action: one of 'exact', 'est', 'fast', 'max' default is 'exact' + option scalar $context: strand/frame context string Note : getter only =cut sub conserved{ my $self = shift; - my ($type, $action) = @_; + my ($type, $action, $context) = @_; $self->_check_type_arg(\$type); $self->_check_action_arg(\$action); - if (!defined $self->{"conserved_${type}_${action}"}) { - $self->_calc_stats($type, $action); + $self->_check_context_arg($type, \$context); + if (!defined $self->{"conserved_${type}_${action}_${context}"}) { + $self->_calc_stats($type, $action, $context); } - return $self->{"conserved_${type}_${action}"}; + return $self->{"conserved_${type}_${action}_${context}"}; } =head2 length Title : length - Usage : $tiling->length($type, $action) + Usage : $tiling->length($type, $action, $context) Function: Retrieve the total length of aligned residues for the seq $type Example : Returns : value of length (a scalar) Args : scalar $type: one of 'hit', 'subject', 'query' default is 'query' - option scalar $action: one of 'exact', 'est', 'max' + option scalar $action: one of 'exact', 'est', 'fast', 'max' default is 'exact' + option scalar $context: strand/frame context string Note : getter only =cut sub length{ my $self = shift; - my ($type,$action) = @_; + my ($type,$action,$context) = @_; $self->_check_type_arg(\$type); $self->_check_action_arg(\$action); - if (!defined $self->{"length_${type}_${action}"}) { - $self->_calc_stats($type, $action); + $self->_check_context_arg($type, \$context); + if (!defined $self->{"length_${type}_${action}_${context}"}) { + $self->_calc_stats($type, $action, $context); } - return $self->{"length_${type}_${action}"}; + return $self->{"length_${type}_${action}_${context}"}; } -=head2 frac_identical +=head2 frac - Title : frac_identical - Usage : $tiling->frac_identical($type, $denom) + Title : frac + Usage : $tiling->frac($type, $denom, $action, $context, $method) Function: Return the fraction of sequence length consisting - of identical pairs, with respect to $denom + of desired kinds of pairs (given by $method), + with respect to $denom Returns : scalar float - Args : scalar $type, one of 'hit', 'subject', 'query' - scalar $denom, one of 'total', 'aligned' - Note : $denom == 'aligned', return identities/num_aligned - $denom == 'total', return identities/_reported_length + Args : -type => one of 'hit', 'subject', 'query' + -denom => one of 'total', 'aligned' + -action => one of 'exact', 'est', 'fast', 'max' + -context => strand/frame context string + -method => one of 'identical', 'conserved' + Note : $denom == 'aligned', return desired_stat/num_aligned + $denom == 'total', return desired_stat/_reported_length (i.e., length of the original input sequences) - + Note : In keeping with the spirit of Bio::Search::HSP::HSPI, + reported lengths of translated dna are reduced by + a factor of 3, to provide fractions relative to + amino acid coordinates. + =cut -sub frac_identical { - my ($self, $type, $denom) = @_; - if (@_ == 1) { - $type = ''; - $self->_check_type_arg(\$type); # set default - $denom = 'total'; # is this the right default? +sub frac { + my $self = shift; + my @args = @_; + my ($type, $denom, $action, $context, $method) = $self->_rearrange([qw(TYPE DENOM ACTION CONTEXT METHOD)], at args); + $self->_check_type_arg(\$type); + $self->_check_action_arg(\$action); + $self->_check_context_arg($type, \$context); + unless ($method and grep(/^$method$/, qw( identical conserved ))) { + $self->throw("-method must specified; one of ('identical', 'conserved')"); } - elsif (@_ == 2) { @@ Diff output truncated at 10000 characters. @@ From bugzilla-daemon at portal.open-bio.org Sun May 31 16:26:34 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 31 May 2009 16:26:34 -0400 Subject: [Bioperl-guts-l] [Bug 2844] New: Patch to add "revtrans" method to Bio::Tools::SeqPattern Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2844 Summary: Patch to add "revtrans" method to Bio::Tools::SeqPattern Product: BioPerl Version: unspecified Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: bioperl-dev AssignedTo: bioperl-guts-l at bioperl.org ReportedBy: vecchi.b at gmail.com The attached patch adds one method to Bio::Tools::SeqPattern: revtrans. This method reverse translates a Bio::Tools::SeqPattern instance of the type "Amino" into a Bio::Tools::SeqPattern instance of the type "Dna". It was discussed in the mailing list as a valuable addition, and it has already been accepted as a script. Regarding its addition as a method, Heikki Lehvaslaiho had concerns about it making the module slower. Therefore, I attach the results of a benchmark (and the script that runs it), proving that it doesn't significantly lower the speed of the original methods of the SeqPattern module. The patch produces the following modifications: Bio/Tools/SeqPattern.pm: it adds the "revtrans" method, along with its documentation. This method is simply a thin wrapper over the main "_reverse_translate_motif" method of the Bio/Tools/SeqPattern/Revtrans.pm module. The latter is only 'require-d', and the helper subroutine only imported when the "revtrans" method is called. Bio/Tools/SeqPattern/Revtrans.pm: This module contains all of the logic for the "revtrans" subroutine, and exports the "_reverse_translate_motif" subroutine upon demand from the main SeqPattern module. t/SeqTools/Revtrans.t: new test file for for the Revtrans.pm module. t/SeqTools/SeqPatternt: Added tests to assess the correct functionality and error checking of the new method. Needless to say; if the patch is accepted I'd gladly assume the commitment to maintain it. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun May 31 16:27:32 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 31 May 2009 16:27:32 -0400 Subject: [Bioperl-guts-l] [Bug 2844] Patch to add "revtrans" method to Bio::Tools::SeqPattern In-Reply-To: Message-ID: <200905312027.n4VKRWSx030425@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2844 ------- Comment #1 from vecchi.b at gmail.com 2009-05-31 16:27 EST ------- Created an attachment (id=1311) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1311&action=view) The svn diff to apply the patch -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun May 31 16:28:21 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 31 May 2009 16:28:21 -0400 Subject: [Bioperl-guts-l] [Bug 2844] Patch to add "revtrans" method to Bio::Tools::SeqPattern In-Reply-To: Message-ID: <200905312028.n4VKSLNi030609@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2844 ------- Comment #2 from vecchi.b at gmail.com 2009-05-31 16:28 EST ------- Created an attachment (id=1312) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1312&action=view) The benchmarking script -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun May 31 16:28:47 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 31 May 2009 16:28:47 -0400 Subject: [Bioperl-guts-l] [Bug 2844] Patch to add "revtrans" method to Bio::Tools::SeqPattern In-Reply-To: Message-ID: <200905312028.n4VKSltE030650@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2844 ------- Comment #3 from vecchi.b at gmail.com 2009-05-31 16:28 EST ------- Created an attachment (id=1313) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1313&action=view) The results of the benchmark -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun May 31 16:59:03 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 31 May 2009 16:59:03 -0400 Subject: [Bioperl-guts-l] [Bug 2844] Patch to add "revtrans" method to Bio::Tools::SeqPattern In-Reply-To: Message-ID: <200905312059.n4VKx3bD032504@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2844 ------- Comment #4 from cjfields at bioperl.org 2009-05-31 16:59 EST ------- 'Revtrans' could mean 'reverse transcribe' or 'reverse translate'. In fact it almost always means the former in my line of work, whereas you are using it to mean the latter. I suggest disambiguating that (maybe 'rtranlate' or 'revtranslate'). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From maj at dev.open-bio.org Sun May 31 17:20:39 2009 From: maj at dev.open-bio.org (Mark Allen Jensen) Date: Sun, 31 May 2009 17:20:39 -0400 Subject: [Bioperl-guts-l] [15724] bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm: context for tiling iterator Message-ID: <200905312120.n4VLKdrd001501@dev.open-bio.org> Revision: 15724 Author: maj Date: 2009-05-31 17:20:38 -0400 (Sun, 31 May 2009) Log Message: ----------- context for tiling iterator Modified Paths: -------------- bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm Modified: bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm =================================================================== --- bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm 2009-05-31 00:21:22 UTC (rev 15723) +++ bioperl-dev/trunk/Bio/Search/Tiling/MapTiling.pm 2009-05-31 21:20:38 UTC (rev 15724) @@ -230,9 +230,10 @@ sub next_tiling{ my $self = shift; - my $type = shift; + my ($type, $context) = @_; $self->_check_type_arg(\$type); - return $self->_tiling_iterator($type)->(); + $self->_check_context_arg($type, \$context); + return $self->_tiling_iterator($type, $context)->(); } =head2 rewind_tilings @@ -249,9 +250,10 @@ sub rewind_tilings{ my $self = shift; - my $type = shift; + my ($type,$context) = @_ $self->_check_type_arg(\$type); - return $self->_tiling_iterator($type)->('REWIND'); + $self->_check_context_arg($type, \$context); + return $self->_tiling_iterator($type, $context)->('REWIND'); } =head2 STATISTICS @@ -897,7 +899,7 @@ and returns only the maximum identites/positives over overlapping HSP for the component interval. No averaging is involved here. - 'fast' is doesn't involve tiling at all (hence the name), + 'fast' doesn't involve tiling at all (hence the name), but it seems like a very good estimate, and uses only reported values, and so does not require sequence data. It calculates an average of reported identities, conserved From bugzilla-daemon at portal.open-bio.org Sun May 31 17:27:42 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 31 May 2009 17:27:42 -0400 Subject: [Bioperl-guts-l] [Bug 2844] Patch to add "revtrans" method to Bio::Tools::SeqPattern In-Reply-To: Message-ID: <200905312127.n4VLRgQE001694@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2844 ------- Comment #5 from jason at bioperl.org 2009-05-31 17:27 EST ------- or backtranslate - http://www.biorecipes.com/BackTranslate/code.html -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun May 31 18:01:18 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 31 May 2009 18:01:18 -0400 Subject: [Bioperl-guts-l] [Bug 2844] Patch to add "revtrans" method to Bio::Tools::SeqPattern In-Reply-To: Message-ID: <200905312201.n4VM1I3O003843@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2844 ------- Comment #6 from vecchi.b at gmail.com 2009-05-31 18:01 EST ------- Created an attachment (id=1314) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1314&action=view) diff file for the addition of "backtranslate" method Good points. I've settled for "backtranslate", as per Jason's suggestion. The new attached diff file contains the full patch with the replaced method and file names. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.