From scain at dev.open-bio.org Wed Aug 1 23:29:38 2007 From: scain at dev.open-bio.org (Scott Cain) Date: Thu, 02 Aug 2007 03:29:38 +0000 Subject: [Bioperl-guts-l] bioperl-live/Bio/DB/GFF/Adaptor berkeleydb.pm, 1.27, 1.28 Message-ID: <200708020329.l723Tcnl013736@dev.open-bio.org> Update of /home/repository/bioperl/bioperl-live/Bio/DB/GFF/Adaptor In directory dev.open-bio.org:/tmp/cvs-serv13727 Modified Files: berkeleydb.pm Log Message: fixing empty fasta database (the adaptor was partially changed to use locally scoped filehandles) Index: berkeleydb.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-live/Bio/DB/GFF/Adaptor/berkeleydb.pm,v retrieving revision 1.27 retrieving revision 1.28 diff -C2 -d -r1.27 -r1.28 *** berkeleydb.pm 28 Sep 2006 18:37:59 -0000 1.27 --- berkeleydb.pm 2 Aug 2007 03:29:35 -0000 1.28 *************** *** 363,366 **** --- 363,370 ---- my $self = shift; my ($io_handle,$id) = @_; + + $self->warn("handle:$io_handle,id:$id"); + die; + my $file = $self->_fasta_file; my $loaded = 0; *************** *** 375,381 **** while (<$io_handle>) { $loaded++ if /^>/; ! print F $_; } ! close F; my $dna_db = Bio::DB::Fasta->new($file) or $self->throw("Can't reindex sequence file: $@"); $self->dna_db($dna_db); --- 379,385 ---- while (<$io_handle>) { $loaded++ if /^>/; ! print $F $_; } ! close $F; my $dna_db = Bio::DB::Fasta->new($file) or $self->throw("Can't reindex sequence file: $@"); $self->dna_db($dna_db); From scain at dev.open-bio.org Wed Aug 1 23:35:17 2007 From: scain at dev.open-bio.org (Scott Cain) Date: Thu, 02 Aug 2007 03:35:17 +0000 Subject: [Bioperl-guts-l] bioperl-live/Bio/DB/GFF/Adaptor berkeleydb.pm, 1.28, 1.29 Message-ID: <200708020335.l723ZHF9013765@dev.open-bio.org> Update of /home/repository/bioperl/bioperl-live/Bio/DB/GFF/Adaptor In directory dev.open-bio.org:/tmp/cvs-serv13756 Modified Files: berkeleydb.pm Log Message: removing a deadly debugging statement Index: berkeleydb.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-live/Bio/DB/GFF/Adaptor/berkeleydb.pm,v retrieving revision 1.28 retrieving revision 1.29 diff -C2 -d -r1.28 -r1.29 *** berkeleydb.pm 2 Aug 2007 03:29:35 -0000 1.28 --- berkeleydb.pm 2 Aug 2007 03:35:15 -0000 1.29 *************** *** 363,370 **** my $self = shift; my ($io_handle,$id) = @_; - - $self->warn("handle:$io_handle,id:$id"); - die; - my $file = $self->_fasta_file; my $loaded = 0; --- 363,366 ---- From bugzilla-daemon at portal.open-bio.org Thu Aug 2 04:43:22 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 2 Aug 2007 04:43:22 -0400 Subject: [Bioperl-guts-l] [Bug 2341] New: Bio::Map bug when setting a marker position as zero Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2341 Summary: Bio::Map bug when setting a marker position as zero Product: BioPerl Version: main-trunk Platform: PC OS/Version: Linux Status: NEW Severity: major Priority: P2 Component: Core Components AssignedTo: bioperl-guts-l at bioperl.org ReportedBy: n.haigh at sheffield.ac.uk I have a Bio::Map::Marker object who's position I want to set in two different Bio::Map::SimpleMap objects. One of these maps has units in cM and the marker has a position of 0cM. Adding this as a position to the marker shows no errors or warnings. However, when trying to retrieve the positions using get_positions I get an exception thrown: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: The value has not been set, can't convert to numeric STACK: Error::throw STACK: Bio::Root::Root::throw /home/bo1nsh/cvswc/bioperl-live/Bio/Root/Root.pm:357 STACK: Bio::Map::Position::numeric /home/bo1nsh/cvswc/bioperl-live/Bio/Map/Position.pm:252 STACK: Bio::Map::Position::sortable /home/bo1nsh/cvswc/bioperl-live/Bio/Map/Position.pm:394 STACK: Bio::Map::MappableI::get_positions /home/bo1nsh/cvswc/bioperl-live/Bio/Map/MappableI.pm:156 STACK: test.pl:22 ----------------------------------------------------------- -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Aug 2 04:54:17 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 2 Aug 2007 04:54:17 -0400 Subject: [Bioperl-guts-l] [Bug 2341] Bio::Map bug when setting a marker position as zero In-Reply-To: Message-ID: <200708020854.l728sHVv001911@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2341 ------- Comment #1 from bix at sendu.me.uk 2007-08-02 04:54 EST ------- Please supply the code that generates the exception. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Aug 2 04:55:35 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 2 Aug 2007 04:55:35 -0400 Subject: [Bioperl-guts-l] [Bug 2341] Bio::Map bug when setting a marker position as zero In-Reply-To: Message-ID: <200708020855.l728tZXX002108@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2341 ------- Comment #2 from n.haigh at sheffield.ac.uk 2007-08-02 04:55 EST ------- Created an attachment (id=722) --> (http://bugzilla.open-bio.org/attachment.cgi?id=722&action=view) script showing the bug -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From sendu at dev.open-bio.org Thu Aug 2 05:15:04 2007 From: sendu at dev.open-bio.org (Senduran Balasubramaniam) Date: Thu, 02 Aug 2007 09:15:04 +0000 Subject: [Bioperl-guts-l] bioperl-live/Bio/Map Marker.pm,1.26,1.27 Message-ID: <200708020915.l729F4We014639@dev.open-bio.org> Update of /home/repository/bioperl/bioperl-live/Bio/Map In directory dev.open-bio.org:/tmp/cvs-serv14614/Bio/Map Modified Files: Marker.pm Log Message: fixed bug in setting position value of 0 Index: Marker.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-live/Bio/Map/Marker.pm,v retrieving revision 1.26 retrieving revision 1.27 diff -C2 -d -r1.26 -r1.27 *** Marker.pm 14 Jun 2007 14:16:12 -0000 1.26 --- Marker.pm 2 Aug 2007 09:15:02 -0000 1.27 *************** *** 218,222 **** my $pos = Bio::Map::Position->new(); $pos->map($map) if $map; ! $pos->value($value) if $value; $pos->element($self); return $pos; --- 218,222 ---- my $pos = Bio::Map::Position->new(); $pos->map($map) if $map; ! $pos->value($value) if defined($value); $pos->element($self); return $pos; From bugzilla-daemon at portal.open-bio.org Thu Aug 2 05:15:48 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 2 Aug 2007 05:15:48 -0400 Subject: [Bioperl-guts-l] [Bug 2341] Bio::Map bug when setting a marker position as zero In-Reply-To: Message-ID: <200708020915.l729Fmga004272@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2341 bix at sendu.me.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from bix at sendu.me.uk 2007-08-02 05:15 EST ------- Should be fixed now. Thanks for the bug report. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Aug 2 05:32:53 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 2 Aug 2007 05:32:53 -0400 Subject: [Bioperl-guts-l] [Bug 2341] Bio::Map bug when setting a marker position as zero In-Reply-To: Message-ID: <200708020932.l729WroC005942@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2341 ------- Comment #4 from n.haigh at sheffield.ac.uk 2007-08-02 05:32 EST ------- The fix doesn't seem to have come through on HEAD yet - have you committed? ;-) Also, should I commit a test to ensure it doesn't crop up again? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Aug 2 05:34:19 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 2 Aug 2007 05:34:19 -0400 Subject: [Bioperl-guts-l] [Bug 2341] Bio::Map bug when setting a marker position as zero In-Reply-To: Message-ID: <200708020934.l729YJeM007103@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2341 ------- Comment #5 from n.haigh at sheffield.ac.uk 2007-08-02 05:34 EST ------- Doh, my bad - I forgot to change my PERL5LIB env variable! :) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From sendu at dev.open-bio.org Thu Aug 2 05:42:59 2007 From: sendu at dev.open-bio.org (Senduran Balasubramaniam) Date: Thu, 02 Aug 2007 09:42:59 +0000 Subject: [Bioperl-guts-l] bioperl-live/t Map.t,1.13,1.14 Message-ID: <200708020942.l729gx3Y014876@dev.open-bio.org> Update of /home/repository/bioperl/bioperl-live/t In directory dev.open-bio.org:/tmp/cvs-serv14851/t Modified Files: Map.t Log Message: test for 0 valued positions Index: Map.t =================================================================== RCS file: /home/repository/bioperl/bioperl-live/t/Map.t,v retrieving revision 1.13 retrieving revision 1.14 diff -C2 -d -r1.13 -r1.14 *** Map.t 27 Jun 2007 10:16:37 -0000 1.13 --- Map.t 2 Aug 2007 09:42:57 -0000 1.14 *************** *** 8,12 **** use BioperlTest; ! test_begin(-tests => 147); use_ok('Bio::Map::SimpleMap'); --- 8,12 ---- use BioperlTest; ! test_begin(-tests => 150); use_ok('Bio::Map::SimpleMap'); *************** *** 80,83 **** --- 80,90 ---- is $p[1], 200; is $p[2], 300; + + # make sure we can add positions with 0 value + my $map3 = Bio::Map::SimpleMap->new(); + $marker->add_position($map3, 0); + ok my @positions = $marker->get_positions($map3); + is @positions, 1; + is $positions[0]->value, 0; } From bugzilla-daemon at portal.open-bio.org Thu Aug 2 05:43:36 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 2 Aug 2007 05:43:36 -0400 Subject: [Bioperl-guts-l] [Bug 2341] Bio::Map bug when setting a marker position as zero In-Reply-To: Message-ID: <200708020943.l729ha0C008315@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2341 ------- Comment #6 from bix at sendu.me.uk 2007-08-02 05:43 EST ------- I've just committed a test for this as well. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From cjfields at dev.open-bio.org Thu Aug 2 13:48:23 2007 From: cjfields at dev.open-bio.org (Christopher John Fields) Date: Thu, 02 Aug 2007 17:48:23 +0000 Subject: [Bioperl-guts-l] bioperl-live/Bio/Phenotype/OMIM OMIMparser.pm, 1.24, 1.25 Message-ID: <200708021748.l72HmN9p015517@dev.open-bio.org> Update of /home/repository/bioperl/bioperl-live/Bio/Phenotype/OMIM In directory dev.open-bio.org:/tmp/cvs-serv15492 Modified Files: OMIMparser.pm Log Message: bug fix and cleanup; genemap not loaded in some cases, so let Bio::Root::IO do the error throwing. Index: OMIMparser.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-live/Bio/Phenotype/OMIM/OMIMparser.pm,v retrieving revision 1.24 retrieving revision 1.25 diff -C2 -d -r1.24 -r1.25 *** OMIMparser.pm 14 Jun 2007 14:16:12 -0000 1.24 --- OMIMparser.pm 2 Aug 2007 17:48:21 -0000 1.25 *************** *** 406,412 **** if ( defined $value ) { $self->{ "_genemap_file_name" } = $value; ! if ( $value =~ /\W/ ) { ! _genemap_hash( $self->_read_genemap( $value ) ); ! } } --- 406,410 ---- if ( defined $value ) { $self->{ "_genemap_file_name" } = $value; ! $self->_genemap_hash( $self->_read_genemap( $value ) ); } *************** *** 875,879 **** my $line = ""; my %genemap_hash = (); ! my $genemap_file = Bio::Root::IO->new->new( -file => $genemap_file_name ); my @a = (); my %gm = (); --- 873,877 ---- my $line = ""; my %genemap_hash = (); ! my $genemap_file = Bio::Root::IO->new( -file => $genemap_file_name ); my @a = (); my %gm = (); From cjfields at dev.open-bio.org Thu Aug 2 16:47:33 2007 From: cjfields at dev.open-bio.org (Christopher John Fields) Date: Thu, 02 Aug 2007 20:47:33 +0000 Subject: [Bioperl-guts-l] bioperl-live/Bio/SearchIO infernal.pm,1.7,1.8 Message-ID: <200708022047.l72KlXgN015699@dev.open-bio.org> Update of /home/repository/bioperl/bioperl-live/Bio/SearchIO In directory dev.open-bio.org:/tmp/cvs-serv15674 Modified Files: infernal.pm Log Message: infernal 0.81 parsing not implemented yet Index: infernal.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-live/Bio/SearchIO/infernal.pm,v retrieving revision 1.7 retrieving revision 1.8 diff -C2 -d -r1.7 -r1.8 *** infernal.pm 14 Jun 2007 14:16:14 -0000 1.7 --- infernal.pm 2 Aug 2007 20:47:31 -0000 1.8 *************** *** 187,190 **** --- 187,191 ---- $convert && $self->convert_meta($convert); $desc && $self->query_description($desc); + $version ||= $DEFAULT_VERSION; $self->version($version); *************** *** 209,343 **** sub next_result { my ($self) = @_; ! my $seentop = 0; ! local $/ = "\n"; ! local $_; ! my ($accession, $db, $algorithm, $model, $description, $version) = ! ($self->query_accession, $self->database, $self->algorithm, ! $self->model, $self->query_description, $self->version); ! my $maxscore; ! my $cutoff = $self->hsp_minscore; ! $self->start_document(); ! local ($_); ! my $line; ! my ($lasthit, $lastscore, $laststart, $lastend); ! my $hitline; ! PARSER: ! while ( defined( $line = $self->_readline ) ) { ! next if $line =~ m{^\s+$}; ! # bypass this for now... ! next if $line =~ m{^HMM\shit}; ! if ($line =~ m{^sequence:\s+(\S+)} ){ ! if (!$self->within_element('result')) { ! $seentop = 1; ! $self->start_element({'Name' => 'Result'}); ! $self->element_hash({ ! 'Infernal_program' => $algorithm, ! 'Infernal_query-def' => $model, ! 'Infernal_query-acc' => $accession, ! 'Infernal_querydesc' => $description, ! 'Infernal_db' => $db ! }); ! } ! if ($self->in_element('hit')) { ! $self->element_hash({'Hit_score' => $maxscore, ! 'Hit_bits' => $maxscore}); ! $maxscore = undef; ! $self->end_element({'Name' => 'Hit'}); ! } ! $lasthit = $1; ! } elsif ($line =~ m{^hit\s+\d+\s+:\s+(\d+)\s+(\d+)\s+(\d+\.\d+)\s+bits}xms) { ! ($laststart, $lastend, $lastscore) = ($1, $2, $3); ! $maxscore = $lastscore unless $maxscore; ! if ($lastscore > $cutoff) { ! if (!$self->within_element('hit')) { ! my ($gi, $acc, $ver) = $self->_get_seq_identifiers($lasthit); ! $self->start_element({'Name' => 'Hit'}); ! $self->element_hash({ ! 'Hit_id' => $lasthit, ! 'Hit_accession' => $ver ? "$acc.$ver" : ! $acc ? $acc : $lasthit, ! 'Hit_gi' => $gi ! }); ! } ! # necessary as infernal 0.71 has repeated hit line ! if (!$self->in_element('hsp')) { ! $self->start_element({'Name' => 'Hsp'}); ! } ! $maxscore = ($maxscore < $lastscore) ? $lastscore : ! $maxscore; ! } ! } elsif ($line =~ m{^(\s+)[<>\{\}\(\)\[\]:_,-\.]+}xms) { # start of HSP ! $self->_pushback($line); # set up for loop ! # what is length of the gap to the structure data? ! my $offset = length($1); ! my ($ct, $strln) = 0; ! my $hsp; ! HSP: ! my %hsp_key = ('0' => 'meta', ! '1' => 'query', ! '2' => 'midline', ! '3' => 'hit'); ! HSP: ! while ($line = $self->_readline) { ! next if $line =~ m{^\s*$}; # toss empty lines ! chomp $line; ! # exit loop if at end of file or upon next hit/HSP ! if (!defined($line) || $line =~ m{^\S+}) { ! $self->_pushback($line); ! last HSP; ! } ! # iterate to keep track of each line (4 lines per hsp block) ! my $iterator = $ct%4; ! # strlen set only with structure lines (proper length) ! $strln = length($line) if $iterator == 0; ! # only grab the data needed (hit start and stop in hit line above) ! ! my $data = substr($line, $offset, $strln-$offset); ! $hsp->{ $hsp_key{$iterator} } .= $data; ! $ct++; ! } ! # query start, end are from the actual query length (entire hit is ! # mapped to CM data, so all CM data is represented) ! # works for now... ! if ($self->in_element('hsp')) { ! my $strlen = $hsp->{'query'} =~ tr{A-Za-z}{A-Za-z}; ! ! my $metastr; ! # Ugh...these should be passed in a hash ! $metastr = ($self->convert_meta) ? ($self->simple_meta($hsp->{'meta'})) : ! ($hsp->{'meta'}); ! $self->element_hash( ! {'Hsp_qseq' => $hsp->{'query'}, ! 'Hsp_hseq' => $hsp->{'hit'}, ! 'Hsp_midline' => $hsp->{'midline'}, ! 'Hsp_structure' => $metastr, ! 'Hsp_query-from' => 1, ! 'Infernal_query-len' => $strlen, ! 'Hsp_query-to' => $strlen, ! 'Hsp_hit-from' => $laststart, ! 'Hsp_hit-to' => $lastend, ! 'Hsp_score' => $lastscore, ! 'Hsp_bit-score' => $lastscore ! }); ! $self->end_element({'Name' => 'Hsp'}); ! } ! } elsif ($line =~ m{^memory}xms || $line =~ m{^CYK\smemory}xms ) { ! if ($self->within_element('result') && $seentop) { ! $self->element( ! {'Name' => 'Infernal_version', ! 'Data' => $version} ! ); ! if ($self->in_element('hit')) { ! $self->element_hash({'Hit_score' => $maxscore, ! 'Hit_bits' => $maxscore}); ! $self->end_element({'Name' => 'Hit'}); ! } ! last PARSER; ! } ! } } ! $self->within_element('hit') && $self->end_element( { 'Name' => 'Hit' } ); ! $self->end_element( { 'Name' => 'Result' } ) if $seentop; ! return $self->end_document(); } --- 210,217 ---- sub next_result { my ($self) = @_; ! unless ($self->{'_handlerset'}) { ! $self->_set_handler; } ! return $self->_next_result; } *************** *** 747,751 **** Returns : converted string Args : [required] string to convert ! Note : This is a very simple comversion method to get simple bracketed format from Infernal data. If the convert_meta() flag is set, this is the method used to convert the strings. --- 621,625 ---- Returns : converted string Args : [required] string to convert ! Note : This is a very simple conversion method to get simple bracketed format from Infernal data. If the convert_meta() flag is set, this is the method used to convert the strings. *************** *** 768,770 **** --- 642,813 ---- } + ## private methods + + sub _next_result { + shift->throw("Handler not set!"); + } + + # this guesses the format and sets the proper handler for parsing + sub _set_handler { + my $self = shift; + *_next_result = \&_parse_old; + return; + my $line; + while ($line = $self->_readline) { + # advance to first line + next if $line =~ m{^\s*$}; + # newer output starts with model name + if ($line =~ m{^CM\s\d+:}) { + *_next_result = \&_parse_new; + } else { + *_next_result = \&_parse_old; + } + last; + } + $self->_pushback($line); + } + + # cmsearch 0.81 and above + sub _parse_new { + my ($self) = @_; + $self->throw("Parsing of v.0.81 output not implemented yet!"); + } + + sub _parse_old { + my ($self) = @_; + my $seentop = 0; + local $/ = "\n"; + local $_; + my ($accession, $db, $algorithm, $model, $description, $version) = + ($self->query_accession, $self->database, $self->algorithm, + $self->model, $self->query_description, $self->version); + my $maxscore; + my $cutoff = $self->hsp_minscore; + $self->start_document(); + local ($_); + my $line; + my ($lasthit, $lastscore, $laststart, $lastend); + my $hitline; + PARSER: + while ( defined( $line = $self->_readline ) ) { + next if $line =~ m{^\s+$}; + # bypass this for now... + next if $line =~ m{^HMM\shit}; + # pre-0.81 + if ($line =~ m{^sequence:\s+(\S+)} ){ + if (!$self->within_element('result')) { + $seentop = 1; + $self->start_element({'Name' => 'Result'}); + $self->element_hash({ + 'Infernal_program' => $algorithm, + 'Infernal_query-def' => $model, + 'Infernal_query-acc' => $accession, + 'Infernal_querydesc' => $description, + 'Infernal_db' => $db + }); + } + if ($self->in_element('hit')) { + $self->element_hash({'Hit_score' => $maxscore, + 'Hit_bits' => $maxscore}); + $maxscore = undef; + $self->end_element({'Name' => 'Hit'}); + } + $lasthit = $1; + } elsif ($line =~ m{^hit\s+\d+\s+:\s+(\d+)\s+(\d+)\s+(\d+\.\d+)\s+bits}xms) { + ($laststart, $lastend, $lastscore) = ($1, $2, $3); + $maxscore = $lastscore unless $maxscore; + if ($lastscore > $cutoff) { + if (!$self->within_element('hit')) { + my ($gi, $acc, $ver) = $self->_get_seq_identifiers($lasthit); + $self->start_element({'Name' => 'Hit'}); + $self->element_hash({ + 'Hit_id' => $lasthit, + 'Hit_accession' => $ver ? "$acc.$ver" : + $acc ? $acc : $lasthit, + 'Hit_gi' => $gi + }); + } + # necessary as infernal 0.71 has repeated hit line + if (!$self->in_element('hsp')) { + $self->start_element({'Name' => 'Hsp'}); + } + $maxscore = ($maxscore < $lastscore) ? $lastscore : + $maxscore; + } + } elsif ($line =~ m{^(\s+)[<>\{\}\(\)\[\]:_,-\.]+}xms) { # start of HSP + $self->_pushback($line); # set up for loop + # what is length of the gap to the structure data? + my $offset = length($1); + my ($ct, $strln) = 0; + my $hsp; + HSP: + my %hsp_key = ('0' => 'meta', + '1' => 'query', + '2' => 'midline', + '3' => 'hit'); + HSP: + while ($line = $self->_readline) { + next if $line =~ m{^\s*$}; # toss empty lines + chomp $line; + # exit loop if at end of file or upon next hit/HSP + if (!defined($line) || $line =~ m{^\S+}) { + $self->_pushback($line); + last HSP; + } + # iterate to keep track of each line (4 lines per hsp block) + my $iterator = $ct%4; + # strlen set only with structure lines (proper length) + $strln = length($line) if $iterator == 0; + # only grab the data needed (hit start and stop in hit line above) + + my $data = substr($line, $offset, $strln-$offset); + $hsp->{ $hsp_key{$iterator} } .= $data; + $ct++; + } + # query start, end are from the actual query length (entire hit is + # mapped to CM data, so all CM data is represented) + # works for now... + if ($self->in_element('hsp')) { + my $strlen = $hsp->{'query'} =~ tr{A-Za-z}{A-Za-z}; + + my $metastr; + # Ugh...these should be passed in a hash + $metastr = ($self->convert_meta) ? ($self->simple_meta($hsp->{'meta'})) : + ($hsp->{'meta'}); + $self->element_hash( + {'Hsp_qseq' => $hsp->{'query'}, + 'Hsp_hseq' => $hsp->{'hit'}, + 'Hsp_midline' => $hsp->{'midline'}, + 'Hsp_structure' => $metastr, + 'Hsp_query-from' => 1, + 'Infernal_query-len' => $strlen, + 'Hsp_query-to' => $strlen, + 'Hsp_hit-from' => $laststart, + 'Hsp_hit-to' => $lastend, + 'Hsp_score' => $lastscore, + 'Hsp_bit-score' => $lastscore + }); + $self->end_element({'Name' => 'Hsp'}); + } + } elsif ($line =~ m{^memory}xms || $line =~ m{^CYK\smemory}xms ) { + if ($self->within_element('result') && $seentop) { + $self->element( + {'Name' => 'Infernal_version', + 'Data' => $version} + ); + if ($self->in_element('hit')) { + $self->element_hash({'Hit_score' => $maxscore, + 'Hit_bits' => $maxscore}); + $self->end_element({'Name' => 'Hit'}); + } + last PARSER; + } + } + } + $self->within_element('hit') && $self->end_element( { 'Name' => 'Hit' } ); + $self->end_element( { 'Name' => 'Result' } ) if $seentop; + return $self->end_document(); + } + + 1; From cjfields at dev.open-bio.org Wed Aug 8 16:38:54 2007 From: cjfields at dev.open-bio.org (Christopher John Fields) Date: Wed, 08 Aug 2007 20:38:54 +0000 Subject: [Bioperl-guts-l] bioperl-live/Bio/Index Abstract.pm, 1.56, 1.57 Blast.pm, 1.29, 1.30 Message-ID: <200708082038.l78Kcsoi029844@dev.open-bio.org> Update of /home/repository/bioperl/bioperl-live/Bio/Index In directory dev.open-bio.org:/tmp/cvs-serv29819 Modified Files: Abstract.pm Blast.pm Log Message: add use statement if only wanting to index; fix debugging Index: Blast.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-live/Bio/Index/Blast.pm,v retrieving revision 1.29 retrieving revision 1.30 diff -C2 -d -r1.29 -r1.30 *** Blast.pm 14 Jun 2007 14:16:11 -0000 1.29 --- Blast.pm 8 Aug 2007 20:38:52 -0000 1.30 *************** *** 79,84 **** use IO::String; ! use Bio::Root::Version; ! use base qw(Bio::Index::Abstract Bio::Root::Root); --- 79,83 ---- use IO::String; ! use Bio::SearchIO; use base qw(Bio::Index::Abstract Bio::Root::Root); *************** *** 215,219 **** for (my $result = $report->next_result) { my $id = $result->query_name; ! print "id is $id, begin is $begin\n" if ( $self->verbose > 0); $self->add_record($id, $i, $begin); } --- 214,218 ---- for (my $result = $report->next_result) { my $id = $result->query_name; ! $self->debug("id is $id, begin is $begin\n"); $self->add_record($id, $i, $begin); } Index: Abstract.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-live/Bio/Index/Abstract.pm,v retrieving revision 1.56 retrieving revision 1.57 diff -C2 -d -r1.56 -r1.57 *** Abstract.pm 22 Mar 2007 18:30:09 -0000 1.56 --- Abstract.pm 8 Aug 2007 20:38:52 -0000 1.57 *************** *** 595,599 **** # index this file ! $self->debug("Indexing file $file"); # this is supplied by the subclass and does the serious work --- 595,599 ---- # index this file ! $self->debug("Indexing file $file\n"); # this is supplied by the subclass and does the serious work From bugzilla-daemon at portal.open-bio.org Sat Aug 11 09:27:20 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 11 Aug 2007 09:27:20 -0400 Subject: [Bioperl-guts-l] [Bug 2342] New: blastall crash & StandAloneBlast (originally described by Matthew Laird) Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2342 Summary: blastall crash & StandAloneBlast (originally described by Matthew Laird) Product: BioPerl Version: unspecified Platform: Other OS/Version: Linux Status: NEW Severity: minor Priority: P2 Component: bioperl-run AssignedTo: bioperl-guts-l at bioperl.org ReportedBy: gyang at plantbio.uga.edu This should be filed as a bug if possible; could you do that? http://www.bioperl.org/wiki/Bugs Suggestions have been made many times previously that StandAloneBlast, RemoteBlast, etc be combined to use a common API, incorporate other BLAST implementations (i.e. WU-BLAST, NCBI's netblast, etc), and maybe utilize other cross-platform compatible means of running programs and passing off reports to parsers. In fact, Jason, Roger Hall, Torsten, and I discussed tentative plans for plugin-able BLAST wrappers: http://www.bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast Though they have never been acted upon. If I get time towards the end of fall and manage to finish up some other projects I may try taking this on, maybe using the wiki to track progress. chris On Aug 10, 2007, at 10:23 AM, Guojun Yang wrote: > Hi, Chris, > Interestingly, I found the message in bioperl-l from Matthew Laird > 2005 "Blastall & StandAloneBlast". "...the Odd thing is, Blast DOES > run. If one comments out this line in StandAloneBlast.pm, the > execution succeeds perfectly fine". It seemed to be mysterious when > I uncommented the " $self->throw("$executable call crashed: $? $! > $commandstring\n") unless ($status==0) ;" line, the blastall runs. > The only difference from what Matthew saw is that, when I did not > uncomment the line, blastall DID NOT run. > Thanks, > Guojun > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu] > To: Chris Fields [mailto:cjfields at uiuc.edu] > Cc: bioperl-l at lists.open-bio.org > Sent: Thu, 09 Aug 2007 15:03:21 -0400 > Subject: standalone blastall call crashed, please help > > Hi, Chris, > Thanks a lot for your efforts. With your help, I am gaining more > confidence to fix the cgi code. While the remoteblast problem is > fixed now, I am caught in a local blast problem (see the error > message and subroutine). The line starting with * is line 593 in > the error message. I tried command line blastall, it works fine. I > set the permission to all the blast folders and files, it did not > help much. The same sequence and database works OK if I use command > line blastall. I used the seq object ref $query as query, the error > message gives "-i /tmp/...", does this look like an input problem? > The subroutine was working before early 2006 (on a different > machine), I am wondering whether this is due to changes in the > StandAloneBlast.pm? Best, Guojun > > I set the blast env variables: > > BEGIN {$ENV{BLASTDIR} = '/usr/blast-2.2.10/bin'; } > BEGIN {$ENV{BLASTDB}='/usr/blast-2.2.10/data';} > BEGIN {$ENV{BLASTMAT}='/usr/blast-2.2.10/data';} > $PROGRAMDIR = $ENV{'BLASTDIR'} || ''; > ...... > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: blastall call crashed: -1 /usr/blast-2.2.10/bin/blastall -d "/ > usr/blast-2.2.10/data/swissprot" -e 0.001 -i /tmp/3cjvQyodxg - > o /tmp/4qSSO16EZP -p blastx > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.3/Bio/ > Root/Root.pm:359 > STACK: Bio::Tools::Run::StandAloneBlast::_runblast /usr/lib/perl5/ > site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:813 > STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /usr/ > lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:760 > STACK: Bio::Tools::Run::StandAloneBlast::blastall /usr/lib/perl5/ > site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:570 > STACK: main::ancestor makcgi07.txt:593 > STACK: makcgi07.txt:208 > sub ancestor { > use Bio::Tools::Run::StandAloneBlast; > use Bio::SearchIO::blast; > > my $query = Bio::Seq -> new ( -seq=>"$_[0]", > -id=>"test"); > print $query->seq(); > my $len=$query->length(); > my $long_name=$_[1]; > my $long_start=$_[2]; > my $long_end=$_[3]; > @db=('swissprot'); > foreach my $db (@db) { > my $factory = Bio::Tools::Run::StandAloneBlast->new(-program => > "blastx", > -database > => "$db", > -e => 1e-3, > ); > * my $blast_report = $factory->blastall($query); > while (my $result = $blast_report->next_result) { > while( my $hit = $result->next_hit()) { > $hit_name=$hit->name; > $hit_name =~ /\S+[|](\S+)[.]\d+[|].*/; > $name=$1; > $desc = $hit->description(); > if ($desc =~ /.*{|\btransposon\b|\btransposase > \b|}.*/i){ > $AN=0; > $replica=0; > while ($ancestor_name[$AN]) { > $replica=1 if (($ancestor_name[$AN] eq > $long_name) && ($hitname[$AN] eq $name)); > $AN+=1; > } > if ($replica==0) { > push @ancestor_name, $long_name; > push @ancestor_start, $long_start; > push @ancestor_end, $long_end; > push @desc, $desc; > push @hitname,$name; > } > } > } > }} > return @ancestor_name, at ancestor_start, at ancestor_end, at desc; > } > > > > > > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From nathan at dev.open-bio.org Tue Aug 14 04:13:07 2007 From: nathan at dev.open-bio.org (Nathan Haigh) Date: Tue, 14 Aug 2007 08:13:07 +0000 Subject: [Bioperl-guts-l] bioperl-live/t Genpred.t,1.26,1.27 Message-ID: <200708140813.l7E8D7Ve005744@dev.open-bio.org> Update of /home/repository/bioperl/bioperl-live/t In directory dev.open-bio.org:/tmp/cvs-serv5719/t Modified Files: Genpred.t Log Message: Changed variable name so that multiple "my" statements don't mask others in the same scope Index: Genpred.t =================================================================== RCS file: /home/repository/bioperl/bioperl-live/t/Genpred.t,v retrieving revision 1.26 retrieving revision 1.27 diff -C2 -d -r1.26 -r1.27 *** Genpred.t 1 Jul 2007 22:02:51 -0000 1.26 --- Genpred.t 14 Aug 2007 08:13:05 -0000 1.27 *************** *** 259,282 **** # Glimmer 3.X (prokaryotic gene fragment) ! my $glimmer_3 = Bio::Tools::Glimmer->new( '-file' => test_input_file('glimmer3-fragment.predict'), '-detail' => test_input_file('glimmer3-fragment.detail'), ); ! my $g3gene = $glimmer_3->next_prediction; ! ok($g3gene); ! isa_ok $g3gene->location(), 'Bio::Location::Fuzzy'; ! is $g3gene->location->start_pos_type(), 'BEFORE'; ! is $g3gene->location->max_start(), 1; ! is $g3gene->location->end_pos_type(), 'EXACT'; ! is $g3gene->location->end(), 674; ! for (1..3) { $g3gene = $glimmer_3->next_prediction; } ! isa_ok $g3gene->location(), 'Bio::Location::Fuzzy'; ! is $g3gene->location->start_pos_type(), 'EXACT'; ! is $g3gene->location->start(), 2677; ! is $g3gene->location->end_pos_type(), 'AFTER'; ! is $g3gene->location->min_end(), 2932; --- 259,282 ---- # Glimmer 3.X (prokaryotic gene fragment) ! my $glimmer_3a = Bio::Tools::Glimmer->new( '-file' => test_input_file('glimmer3-fragment.predict'), '-detail' => test_input_file('glimmer3-fragment.detail'), ); ! my $g3gene_a = $glimmer_3a->next_prediction; ! ok($g3gene_a); ! isa_ok $g3gene_a->location(), 'Bio::Location::Fuzzy'; ! is $g3gene_a->location->start_pos_type(), 'BEFORE'; ! is $g3gene_a->location->max_start(), 1; ! is $g3gene_a->location->end_pos_type(), 'EXACT'; ! is $g3gene_a->location->end(), 674; ! for (1..3) { $g3gene_a = $glimmer_3a->next_prediction; } ! isa_ok $g3gene_a->location(), 'Bio::Location::Fuzzy'; ! is $g3gene_a->location->start_pos_type(), 'EXACT'; ! is $g3gene_a->location->start(), 2677; ! is $g3gene_a->location->end_pos_type(), 'AFTER'; ! is $g3gene_a->location->min_end(), 2932; From bugzilla-daemon at portal.open-bio.org Tue Aug 14 12:59:53 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 14 Aug 2007 12:59:53 -0400 Subject: [Bioperl-guts-l] [Bug 2344] New: Start/end wrong after removing columns in Bio::SimpleAlign Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2344 Summary: Start/end wrong after removing columns in Bio::SimpleAlign Product: BioPerl Version: unspecified Platform: PC OS/Version: All Status: NEW Severity: major Priority: P1 Component: Core Components AssignedTo: bioperl-guts-l at bioperl.org ReportedBy: bix at sendu.me.uk When Bio::SimpleAlign methods are used that result in the removal of columns, the resulting new alignment does not have the start/end of each sequence corrected. I attach patches that enable this. Only the start/end is corrected: the issue of internal column removals preventing direct mapping back to the original sequence is not tackled here. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Aug 14 13:02:04 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 14 Aug 2007 13:02:04 -0400 Subject: [Bioperl-guts-l] [Bug 2344] Start/end wrong after removing columns in Bio::SimpleAlign In-Reply-To: Message-ID: <200708141702.l7EH24fb030467@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2344 bix at sendu.me.uk changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bioperl-guts-l at bioperl.org |bix at sendu.me.uk Status|NEW |ASSIGNED ------- Comment #1 from bix at sendu.me.uk 2007-08-14 13:02 EST ------- Created an attachment (id=726) --> (http://bugzilla.open-bio.org/attachment.cgi?id=726&action=view) Patch for SimpleAlign.pm -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. You are the assignee for the bug, or are watching the assignee. From heikki at dev.open-bio.org Wed Aug 15 04:55:49 2007 From: heikki at dev.open-bio.org (Heikki Lehvaslaiho) Date: Wed, 15 Aug 2007 08:55:49 +0000 Subject: [Bioperl-guts-l] bioperl-live/Bio/LiveSeq SeqI.pm,1.32,1.33 Message-ID: <200708150855.l7F8tnNt008117@dev.open-bio.org> Update of /home/repository/bioperl/bioperl-live/Bio/LiveSeq In directory dev.open-bio.org:/tmp/cvs-serv8108 Modified Files: SeqI.pm Log Message: hack to get LiveSeq working with newer EMBL seqs with more verbose molecule type strings. (Method is still misleadingly called alphabet.) Index: SeqI.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-live/Bio/LiveSeq/SeqI.pm,v retrieving revision 1.32 retrieving revision 1.33 diff -C2 -d -r1.32 -r1.33 *** SeqI.pm 26 Sep 2006 22:03:08 -0000 1.32 --- SeqI.pm 15 Aug 2007 08:55:47 -0000 1.33 *************** *** 753,757 **** Returns : a string either 'dna','rna','protein'. Args : none - Note : "circular dna" is set as dna =cut --- 753,756 ---- *************** *** 762,766 **** my ($self,$value) = @_; if (defined $value) { ! $value =~ s/circular dna/dna/; unless ( $valid_type{$value} ) { $self->warn("Molecular type '$value' is not a valid type"); --- 761,766 ---- my ($self,$value) = @_; if (defined $value) { ! $value = 'dna' if $value =~ /dna/i; ! $value = 'rna' if $value =~ /rna/i; unless ( $valid_type{$value} ) { $self->warn("Molecular type '$value' is not a valid type"); From bugzilla-daemon at portal.open-bio.org Wed Aug 15 08:26:44 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 15 Aug 2007 08:26:44 -0400 Subject: [Bioperl-guts-l] [Bug 2345] New: Get mappable sequence coords after removing alignment columns Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2345 Summary: Get mappable sequence coords after removing alignment columns Product: BioPerl Version: unspecified Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P3 Component: Core Components AssignedTo: bioperl-guts-l at bioperl.org ReportedBy: bix at sendu.me.uk With regard to bug 2344, Chris Fields suggested the following: a/5-20 atcgatcgatcgatcg b/30-43 -tcgatc-atcgatcg c/50-63 atcgatcgatc-atc- ****** *** *** [ remove the gaps from the alignment in Bio::SimpleAlign, then... ] Adjusting to map simple start/ends to the original sequence won't work as we're removing gaps and residues in the LocatableSeqs along with it (ends and internal residues). I guess if we want to map back to the original sequence accurately we would have to use split locations (not currently implemented with LocatableSeq) or maybe a cigar-like syntax against consensus (ugh), otherwise we wouldn't know where to map the relevant internal gaps (now missing from the alignment) w/o running a local alignment against the original sequence: a/6-11;12-19 tcgatcatcatc b/30-38;40-42 tcgatcatcatc c/51-56;58-63 tcgatcatcatc ************ That could get really hairy for long alignments. We could also return multiple SimpleAligns which map correctly (ugh), but what we really want (and the API specifies) is a new single SimpleAlign. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From sendu at dev.open-bio.org Wed Aug 15 11:06:35 2007 From: sendu at dev.open-bio.org (Senduran Balasubramaniam) Date: Wed, 15 Aug 2007 15:06:35 +0000 Subject: [Bioperl-guts-l] bioperl-live/Bio LocatableSeq.pm,1.44,1.45 Message-ID: <200708151506.l7FF6ZL8008533@dev.open-bio.org> Update of /home/repository/bioperl/bioperl-live/Bio In directory dev.open-bio.org:/tmp/cvs-serv8508/Bio Modified Files: LocatableSeq.pm Log Message: end() now allows getting and setting of 0, allowing the representation of gaps-only in a sequence alignment Index: LocatableSeq.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-live/Bio/LocatableSeq.pm,v retrieving revision 1.44 retrieving revision 1.45 diff -C2 -d -r1.44 -r1.45 *** LocatableSeq.pm 26 Jun 2007 08:55:03 -0000 1.44 --- LocatableSeq.pm 15 Aug 2007 15:06:33 -0000 1.45 *************** *** 112,116 **** Title : start Usage : $obj->start($newval) ! Function: Returns : value of start Args : newvalue (optional) --- 112,117 ---- Title : start Usage : $obj->start($newval) ! Function: Get/set the 1-based start position of this sequence in the original ! sequence. '0' means before the original sequence starts. Returns : value of start Args : newvalue (optional) *************** *** 133,137 **** Title : end Usage : $obj->end($newval) ! Function: Returns : value of end Args : newvalue (optional) --- 134,139 ---- Title : end Usage : $obj->end($newval) ! Function: Get/set the 1-based end position of this sequence in the original ! sequence. '0' means before the original sequence starts. Returns : value of end Args : newvalue (optional) *************** *** 155,159 **** } ! return $self->{'end'} || $self->_ungapped_len; } --- 157,161 ---- } ! return defined $self->{'end'} ? $self->{'end'} : $self->_ungapped_len; } From sendu at dev.open-bio.org Wed Aug 15 11:10:37 2007 From: sendu at dev.open-bio.org (Senduran Balasubramaniam) Date: Wed, 15 Aug 2007 15:10:37 +0000 Subject: [Bioperl-guts-l] bioperl-live/Bio SimpleAlign.pm,1.137,1.138 Message-ID: <200708151510.l7FFAb00008608@dev.open-bio.org> Update of /home/repository/bioperl/bioperl-live/Bio In directory dev.open-bio.org:/tmp/cvs-serv8583/Bio Modified Files: SimpleAlign.pm Log Message: bug 2344 fixed Index: SimpleAlign.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-live/Bio/SimpleAlign.pm,v retrieving revision 1.137 retrieving revision 1.138 diff -C2 -d -r1.137 -r1.138 *** SimpleAlign.pm 2 Jul 2007 12:49:19 -0000 1.137 --- SimpleAlign.pm 15 Aug 2007 15:10:35 -0000 1.138 *************** *** 259,262 **** --- 259,263 ---- $order = keys %{$self->{'_seq'}}; } + $self->throw("$id, $start..$end") unless $id && defined $start && defined $end; $name = sprintf("%s/%d-%d",$id,$start,$end); *************** *** 1114,1118 **** my ($self,$aln,$remove) = @_; my @new; ! # splice out the segments and create new seq foreach my $seq($self->each_seq){ --- 1115,1121 ---- my ($self,$aln,$remove) = @_; my @new; ! ! my $gap = $self->gap_char; ! # splice out the segments and create new seq foreach my $seq($self->each_seq){ *************** *** 1127,1150 **** my $end = $pair->[1]; $sequence = $seq->seq unless $sequence; ! my $spliced; ! $spliced .= $start > 0 ? substr($sequence,0,$start) : ''; ! $spliced .= substr($sequence,$end+1,$seq->length-$end+1); ! $sequence = $spliced; ! if ($start == 1) { ! $new_seq->start($end); ! } ! else { ! $new_seq->start( $seq->start); } # end ! if($end >= $seq->end){ ! $new_seq->end( $start); } else { ! $new_seq->end($seq->end); } } $new_seq->seq($sequence) if $sequence; ! push @new, $new_seq; } # add the new seqs to the alignment --- 1130,1170 ---- my $end = $pair->[1]; $sequence = $seq->seq unless $sequence; ! my $orig = $sequence; ! my $head = $start > 0 ? substr($sequence, 0, $start) : ''; ! my $tail = ($end + 1) >= length($sequence) ? '' : substr($sequence, $end + 1); ! $sequence = $head.$tail; ! # start ! unless (defined $new_seq->start) { ! if ($start == 0) { ! my $start_adjust = () = substr($orig, 0, $end + 1) =~ /$gap/g; ! $new_seq->start($seq->start + $end + 1 - $start_adjust); ! } ! else { ! my $start_adjust = $orig =~ /^$gap+/; ! if ($start_adjust) { ! $start_adjust = $+[0] == $start; ! } ! $new_seq->start($seq->start + $start_adjust); ! } } # end ! if (($end + 1) >= length($orig)) { ! my $end_adjust = () = substr($orig, $start) =~ /$gap/g; ! $new_seq->end($seq->end - (length($orig) - $start) + $end_adjust); } else { ! $new_seq->end($seq->end); } } + + if ($new_seq->end < $new_seq->start) { + # we removed all columns except for gaps: set to 0 to indicate no + # sequence + $new_seq->start(0); + $new_seq->end(0); + } + $new_seq->seq($sequence) if $sequence; ! push @new, $new_seq; } # add the new seqs to the alignment *************** *** 1205,1211 **** my $aln = $self->new; ! # sort the positions to remove columns at the end 1st ! @$positions = sort { $b->[0] <=> $a->[0] } @$positions; ! $aln = $self->_remove_col($aln,$positions); $aln; } --- 1225,1245 ---- my $aln = $self->new; ! # sort the positions ! @$positions = sort { $a->[0] <=> $b->[0] } @$positions; ! ! my @remove; ! my $length = 0; ! foreach my $pos (@{$positions}) { ! my ($start, $end) = @{$pos}; ! ! #have to offset the start and end for subsequent removes ! $start-=$length; ! $end -=$length; ! $length += ($end-$start+1); ! push @remove, [$start,$end]; ! } ! ! #remove the segments ! $aln = $#remove >= 0 ? $self->_remove_col($aln,\@remove) : $self; $aln; } *************** *** 2098,2113 **** my $self = shift; my $seq; ! my $length = (-1); ! my ($temp,$len); ! foreach $seq ( $self->each_seq() ) { ! if ($self->isa("Bio::Seq::LargeSeqI")) { ! $temp = $seq->length(); ! } else { ! $temp = $seq->length; } - if( $temp > $length ) { - $length = $temp; - } } --- 2132,2143 ---- my $self = shift; my $seq; ! my $length = -1; ! my $temp; ! foreach $seq ( $self->each_seq() ) { ! $temp = $seq->length(); ! if( $temp > $length ) { ! $length = $temp; } } From sendu at dev.open-bio.org Wed Aug 15 11:11:12 2007 From: sendu at dev.open-bio.org (Senduran Balasubramaniam) Date: Wed, 15 Aug 2007 15:11:12 +0000 Subject: [Bioperl-guts-l] bioperl-live/t SimpleAlign.t,1.56,1.57 Message-ID: <200708151511.l7FFBC0D008641@dev.open-bio.org> Update of /home/repository/bioperl/bioperl-live/t In directory dev.open-bio.org:/tmp/cvs-serv8616/t Modified Files: SimpleAlign.t Log Message: added tests for bug 2344 Index: SimpleAlign.t =================================================================== RCS file: /home/repository/bioperl/bioperl-live/t/SimpleAlign.t,v retrieving revision 1.56 retrieving revision 1.57 diff -C2 -d -r1.56 -r1.57 *** SimpleAlign.t 27 Jun 2007 10:16:37 -0000 1.56 --- SimpleAlign.t 15 Aug 2007 15:11:10 -0000 1.57 *************** *** 8,12 **** use BioperlTest; ! test_begin(-tests => 115); use_ok('Bio::SimpleAlign'); --- 8,12 ---- use BioperlTest; ! test_begin(-tests => 151); use_ok('Bio::SimpleAlign'); *************** *** 212,217 **** $strout->write_aln($aln2); is $string, ! "P84139/1-33 NEGEHQIKLDELFEKLLRARLIFKNKDVLRRC\n". ! "P814153/1-33 NEGMHQIKLDVLFEKLLRARLIFKNKDVLRRC\n". "BAB68554/1-14 ------------------AMLIFKDKQLLQQC\n". "gb|443893|124775/1-32 MRFRFQIKVPPAVEGARPALLIFKSRPELGGC\n", --- 212,217 ---- $strout->write_aln($aln2); is $string, ! "P84139/2-33 NEGEHQIKLDELFEKLLRARLIFKNKDVLRRC\n". ! "P814153/2-33 NEGMHQIKLDVLFEKLLRARLIFKNKDVLRRC\n". "BAB68554/1-14 ------------------AMLIFKDKQLLQQC\n". "gb|443893|124775/1-32 MRFRFQIKVPPAVEGARPALLIFKSRPELGGC\n", *************** *** 227,231 **** "P814153/1-33 MEGMIKLDVLFEKLLRARLIFKNKDVLRC\n". "BAB68554/1-14 ----------------AMLIFKDKQLLQC\n". ! "gb|443893|124775/1-32 -RFRIKVPPAVEGARPALLIFKSRPELGC\n", 'remove_columns by position (wrong order)'; --- 227,231 ---- "P814153/1-33 MEGMIKLDVLFEKLLRARLIFKNKDVLRC\n". "BAB68554/1-14 ----------------AMLIFKDKQLLQC\n". ! "gb|443893|124775/2-32 -RFRIKVPPAVEGARPALLIFKSRPELGC\n", 'remove_columns by position (wrong order)'; *************** *** 458,459 **** --- 458,560 ---- is($str, $testdata{$ls},"BIC:$str"); } + + + # is _remove_col really working correctly? + my $a = Bio::LocatableSeq->new(-id => 'a', -seq => 'atcgatcgatcgatcg', -start => 5, -end => 20); + my $b = Bio::LocatableSeq->new(-id => 'b', -seq => '-tcgatc-atcgatcg', -start => 30, -end => 43); + my $c = Bio::LocatableSeq->new(-id => 'c', -seq => 'atcgatcgatc-atc-', -start => 50, -end => 63); + my $d = Bio::LocatableSeq->new(-id => 'd', -seq => '--cgatcgatcgat--', -start => 80, -end => 91); + my $e = Bio::LocatableSeq->new(-id => 'e', -seq => '-t-gatcgatcga-c-', -start => 100, -end => 111); + $aln = Bio::SimpleAlign->new(); + $aln->add_seq($a); + $aln->add_seq($b); + $aln->add_seq($c); + + my $gapless = $aln->remove_gaps(); + foreach my $seq ($gapless->each_seq) { + if ($seq->id eq 'a') { + is $seq->start, 6; + is $seq->end, 19; + is $seq->seq, 'tcgatcatcatc'; + } + elsif ($seq->id eq 'b') { + is $seq->start, 30; + is $seq->end, 42; + is $seq->seq, 'tcgatcatcatc'; + } + elsif ($seq->id eq 'c') { + is $seq->start, 51; + is $seq->end, 63; + is $seq->seq, 'tcgatcatcatc'; + } + } + + $aln->add_seq($d); + $aln->add_seq($e); + $gapless = $aln->remove_gaps(); + foreach my $seq ($gapless->each_seq) { + if ($seq->id eq 'a') { + is $seq->start, 8; + is $seq->end, 17; + is $seq->seq, 'gatcatca'; + } + elsif ($seq->id eq 'b') { + is $seq->start, 32; + is $seq->end, 40; + is $seq->seq, 'gatcatca'; + } + elsif ($seq->id eq 'c') { + is $seq->start, 53; + is $seq->end, 61; + is $seq->seq, 'gatcatca'; + } + elsif ($seq->id eq 'd') { + is $seq->start, 81; + is $seq->end, 90; + is $seq->seq, 'gatcatca'; + } + elsif ($seq->id eq 'e') { + is $seq->start, 101; + is $seq->end, 110; + is $seq->seq, 'gatcatca'; + } + } + + my $f = Bio::LocatableSeq->new(-id => 'f', -seq => 'a-cgatcgatcgat-g', -start => 30, -end => 43); + $aln = Bio::SimpleAlign->new(); + $aln->add_seq($a); + $aln->add_seq($f); + + $gapless = $aln->remove_gaps(); + foreach my $seq ($gapless->each_seq) { + if ($seq->id eq 'a') { + is $seq->start, 5; + is $seq->end, 20; + is $seq->seq, 'acgatcgatcgatg'; + } + elsif ($seq->id eq 'f') { + is $seq->start, 30; + is $seq->end, 43; + is $seq->seq, 'acgatcgatcgatg'; + } + } + + my $g = Bio::LocatableSeq->new(-id => 'g', -seq => 'atgc', -start => 5, -end => 8); + my $h = Bio::LocatableSeq->new(-id => 'h', -seq => '-tcg', -start => 30, -end => 32); + $aln = Bio::SimpleAlign->new(); + $aln->add_seq($g); + $aln->add_seq($h); + + my $removed = $aln->remove_columns([1, 3]); + foreach my $seq ($removed->each_seq) { + if ($seq->id eq 'g') { + is $seq->start, 5; + is $seq->end, 5; + is $seq->seq, 'a'; + } + elsif ($seq->id eq 'h') { + is $seq->start, 0; + is $seq->end, 0; + is $seq->seq, '-'; + } + } \ No newline at end of file From sendu at dev.open-bio.org Wed Aug 15 11:24:18 2007 From: sendu at dev.open-bio.org (Senduran Balasubramaniam) Date: Wed, 15 Aug 2007 15:24:18 +0000 Subject: [Bioperl-guts-l] bioperl-live/Bio SimpleAlign.pm,1.138,1.139 Message-ID: <200708151524.l7FFOI0R008678@dev.open-bio.org> Update of /home/repository/bioperl/bioperl-live/Bio In directory dev.open-bio.org:/tmp/cvs-serv8653/Bio Modified Files: SimpleAlign.pm Log Message: removed debugging throw from last commit Index: SimpleAlign.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-live/Bio/SimpleAlign.pm,v retrieving revision 1.138 retrieving revision 1.139 diff -C2 -d -r1.138 -r1.139 *** SimpleAlign.pm 15 Aug 2007 15:10:35 -0000 1.138 --- SimpleAlign.pm 15 Aug 2007 15:24:16 -0000 1.139 *************** *** 259,263 **** $order = keys %{$self->{'_seq'}}; } - $self->throw("$id, $start..$end") unless $id && defined $start && defined $end; $name = sprintf("%s/%d-%d",$id,$start,$end); --- 259,262 ---- From bugzilla-daemon at portal.open-bio.org Thu Aug 16 03:57:28 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 16 Aug 2007 03:57:28 -0400 Subject: [Bioperl-guts-l] [Bug 2346] New: exonerate parser in bioperl-live fails when protein2dna comparison is performed Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2346 Summary: exonerate parser in bioperl-live fails when protein2dna comparison is performed Product: BioPerl Version: main-trunk Platform: Macintosh OS/Version: Mac OS Status: NEW Keywords: Bioperl Severity: enhancement Priority: P2 Component: Bio::Search/Bio::SearchIO AssignedTo: bioperl-guts-l at bioperl.org ReportedBy: tania.oh at anat.ox.ac.uk I was trying to use the Bio::SearchIO::Alignment::Exonerate module to run and parse my exonerate output. But I've noticed that the parser which is actually Bio::SearchIO::Exonerate works if the model used in Exonerate is --model est2genome. I used exonerate with the model --model protein2dna and the parser was unable to parse the hsps. Below is a simple of code I used for testing the output from exonerate: use Bio::SearchIO; use strict; ???my $searchio = Bio::SearchIO->new(-file => 'test_data/exonerate.output.dontwork???', -format => 'exonerate'); while( my $r = $searchio->next_result ) { while(my $hit = $r->next_hit){ while(my $hsp = $hit->next_hsp){ print $hsp->start. "\t". $hsp->end. "\n"; } } print $r->query_name, "\n"; } There are 2 files attached to show the examples of using either the est2genome or protein2dna model: 1. exonerate.output.works - produced from the command line: exonerate -q exonerate_cdna.fa -t exonerate_genomic.fa --model est2genome --bestn 1 > exonerate.output.works 2. exonerate.output.dontwork - produced from the command line: exonerate -q test_aa.fa -t test_cds.fa --model protein2dna > exonerate.output.dontwork Line 239 in Bio::searchIO::exonerate (cut and pasted below) elsif( s/^vulgar:\s+(\S+)\s+ # query sequence id (\d+)\s+(\d+)\s+([\-\+])\s+ # query start-end-strand (\S+)\s+ # target sequence id (\d+)\s+(\d+)\s+([\-\+])\s+ # target start-end-strand (\d+)\s+ # score //ox ) { parses the vulgar line of an --model est2genome exonerate output well. An example of the (complex) vulgar line which I've truncated for readability is: vulgar: MUSSPSYN 3 1279 + 4.143962167-143965267 28 3074 + 6137 M 8 8 G 0 1 M 231 231 5 0 2 I 0 253 3 0 whereas the vulgar line I've obtained from a --model protein2dna exonerate output is much simpler and the parser fails to pick it up: vulgar: SJCHGC00851 0 204 . SJCHGC00851 2 614 + 1059 M 204 612 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Aug 16 04:00:17 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 16 Aug 2007 04:00:17 -0400 Subject: [Bioperl-guts-l] [Bug 2346] exonerate parser in bioperl-live fails when protein2dna comparison is performed In-Reply-To: Message-ID: <200708160800.l7G80H8N004086@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2346 ------- Comment #1 from tania.oh at anat.ox.ac.uk 2007-08-16 04:00 EST ------- Created an attachment (id=728) --> (http://bugzilla.open-bio.org/attachment.cgi?id=728&action=view) exonerate output produced with the model protein2dna -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Aug 16 04:00:46 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 16 Aug 2007 04:00:46 -0400 Subject: [Bioperl-guts-l] [Bug 2346] exonerate parser in bioperl-live fails when protein2dna comparison is performed In-Reply-To: Message-ID: <200708160800.l7G80k69004171@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2346 ------- Comment #2 from tania.oh at anat.ox.ac.uk 2007-08-16 04:00 EST ------- Created an attachment (id=729) --> (http://bugzilla.open-bio.org/attachment.cgi?id=729&action=view) exonerate output produced with the model est2genome -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From sendu at dev.open-bio.org Thu Aug 16 06:49:45 2007 From: sendu at dev.open-bio.org (Senduran Balasubramaniam) Date: Thu, 16 Aug 2007 10:49:45 +0000 Subject: [Bioperl-guts-l] bioperl-run/Bio/Tools/Run/Phylo/PAML Baseml.pm, 1.7, 1.8 Message-ID: <200708161049.l7GAnjjs010564@dev.open-bio.org> Update of /home/repository/bioperl/bioperl-run/Bio/Tools/Run/Phylo/PAML In directory dev.open-bio.org:/tmp/cvs-serv10539/Bio/Tools/Run/Phylo/PAML Modified Files: Baseml.pm Log Message: First working version, and allows tree input. Note, however, that only running of the wrapper works: for some unknown reason parsing of the results fails silently, so there are still no tests for this module. Index: Baseml.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-run/Bio/Tools/Run/Phylo/PAML/Baseml.pm,v retrieving revision 1.7 retrieving revision 1.8 diff -C2 -d -r1.7 -r1.8 *** Baseml.pm 14 Jun 2007 15:23:09 -0000 1.7 --- Baseml.pm 16 Aug 2007 10:49:42 -0000 1.8 *************** *** 1,5 **** # $Id$ # ! # BioPerl module for Bio::Tools::Run::Phylo::PAML::Yn00 # # Cared for by Jason Stajich --- 1,5 ---- # $Id$ # ! # BioPerl module for Bio::Tools::Run::Phylo::PAML::Baseml # # Cared for by Jason Stajich *************** *** 46,52 **** 'noisy' => [ 0..3,9], 'verbose' => [ 0,1,2], # 0:concise, 1:detailed, 2:too much ! 'runmode' => [-2,0..5], # for runmode - # -2 pairwise # 0: use the provided tree structure(s) in treefile # 1,2: mean heuristic search by star-decomposition alg --- 46,51 ---- 'noisy' => [ 0..3,9], 'verbose' => [ 0,1,2], # 0:concise, 1:detailed, 2:too much ! 'runmode' => [0..5], # for runmode # 0: use the provided tree structure(s) in treefile # 1,2: mean heuristic search by star-decomposition alg *************** *** 144,148 **** =head1 CONTRIBUTORS ! Additional contributors names and emails here =head1 APPENDIX --- 143,147 ---- =head1 CONTRIBUTORS ! Sendu Bala - bix at sendu.me.uk =head1 APPENDIX *************** *** 161,180 **** use strict; use Cwd; - use Bio::Root::Root; use Bio::AlignIO; use Bio::TreeIO; - use Bio::Tools::Run::WrapperBase; use Bio::Tools::Phylo::PAML; ! @ISA = qw(Bio::Root::Root Bio::Tools::Run::WrapperBase); ! ! ! =head2 Default Values ! - =cut BEGIN { - $MINNAMELEN = 25; $PROGRAMNAME = 'baseml' . ($^O =~ /mswin/i ?'.exe':''); --- 160,171 ---- use strict; use Cwd; use Bio::AlignIO; use Bio::TreeIO; use Bio::Tools::Phylo::PAML; ! use base qw(Bio::Tools::Run::Phylo::PhyloBase); BEGIN { $MINNAMELEN = 25; $PROGRAMNAME = 'baseml' . ($^O =~ /mswin/i ?'.exe':''); *************** *** 189,195 **** 'noisy' => [ 0..3,9], 'verbose' => [ 0,1,2], # 0:concise, 1:detailed, 2:too much ! 'runmode' => [-2,0..5], # for runmode - # -2 pairwise # 0: use the provided tree structure(s) in treefile # 1,2: mean heuristic search by star-decomposition alg --- 180,185 ---- 'noisy' => [ 0..3,9], 'verbose' => [ 0,1,2], # 0:concise, 1:detailed, 2:too much ! 'runmode' => [0..5], # for runmode # 0: use the provided tree structure(s) in treefile # 1,2: mean heuristic search by star-decomposition alg *************** *** 201,205 **** # Tree search DOES NOT WORK WELL so estimate a tree # using other programs first ! 'model' => '0', # for model # 0: JC69 (uncorrected) --- 191,195 ---- # Tree search DOES NOT WORK WELL so estimate a tree # using other programs first ! 'model' => [5, 0..8], # for model # 0: JC69 (uncorrected) *************** *** 246,263 **** # 3:N1 4:N2 'getSE' => [0,1], ! 'RateAncestor' => [1,0,2], # rates (alpha > 0) or # ancestral states 'cleandata' => [1,0], # remove sites with # ambiguity data (1:yes or 0:no) ! 'fix_blength' => [-1,0,1,2], # 0: ignore, -1: random, # 1: initial, 2: fixed ! # 'icode' => [ 0..10], # (with RateAncestor=1. #try "GC" in data,model=4,Mgene=4) ! 'ndata' => [5,1..10], 'clock' => [0..3], # 0: no clock, 1: clock, 2: local clock, 3: CombinedAnalysis 'Small_Diff' => '1e-6', #underflow issues? ! ); } --- 236,253 ---- # 3:N1 4:N2 'getSE' => [0,1], ! 'RateAncestor' => [0,1,2], # rates (alpha > 0) or # ancestral states 'cleandata' => [1,0], # remove sites with # ambiguity data (1:yes or 0:no) ! 'fix_blength' => [0,-1,1,2], # 0: ignore, -1: random, # 1: initial, 2: fixed ! 'icode' => [ 0..10], # (with RateAncestor=1. #try "GC" in data,model=4,Mgene=4) ! 'ndata' => [1..10], 'clock' => [0..3], # 0: no clock, 1: clock, 2: local clock, 3: CombinedAnalysis 'Small_Diff' => '1e-6', #underflow issues? ! 'Mgene' => [0..4], # 0:rates, 1:separate; 2:diff pi, 3:diff kapa, 4:all diff ); } *************** *** 295,302 **** Title : new ! Usage : my $obj = Bio::Tools::Run::Phylo::PAML::Yn00->new(); ! Function: Builds a new Bio::Tools::Run::Phylo::PAML::Yn00 object ! Returns : Bio::Tools::Run::Phylo::PAML::Yn00 Args : -alignment => the L object -save_tempfiles => boolean to save the generated tempfiles and NOT cleanup after onesself (default FALSE) --- 285,294 ---- Title : new ! Usage : my $obj = Bio::Tools::Run::Phylo::PAML::Baseml->new(); ! Function: Builds a new Bio::Tools::Run::Phylo::PAML::Baseml object ! Returns : Bio::Tools::Run::Phylo::PAML::Baseml Args : -alignment => the L object + -tree => the L object if you want to use runmode + 0 or 1 -save_tempfiles => boolean to save the generated tempfiles and NOT cleanup after onesself (default FALSE) *************** *** 308,317 **** my $self = $class->SUPER::new(@args); ! my ($aln,$st) = $self->_rearrange([qw(ALIGNMENT SAVE_TEMPFILES)], @args); defined $aln && $self->alignment($aln); defined $st && $self->save_tempfiles($st); - $self->set_default_parameters(); return $self; } --- 300,309 ---- my $self = $class->SUPER::new(@args); ! my ($aln,$tree,$st) = $self->_rearrange([qw(ALIGNMENT TREE SAVE_TEMPFILES)], @args); defined $aln && $self->alignment($aln); + defined $tree && $self->tree($tree); defined $st && $self->save_tempfiles($st); return $self; } *************** *** 321,325 **** Title : run Usage : $yn->run(); ! Function: run the yn00 analysis using the default or updated parameters the alignment parameter must have been set Returns : 3 values, --- 313,317 ---- Title : run Usage : $yn->run(); ! Function: run the Baseml analysis using the default or updated parameters the alignment parameter must have been set Returns : 3 values, *************** *** 330,378 **** hash reference same as the previous one except it for the Nei and Gojobori calculated Ka,Ks,omega values ! Args : none ! =cut ! sub run{ ! my ($self,$aln) = @_; ! ($aln) ||= $self->alignment(); if( ! $aln ) { ! $self->warn("must have supplied a valid aligment file in order to run yn00"); return 0; } ! my ($tmpdir) = $self->tempdir(); ! my ($tempseqFH,$tempseqfile); ! if( ! ref($aln) && -e $aln ) { ! $tempseqfile = $aln; ! } else { ! ($tempseqFH,$tempseqfile) = $self->io->tempfile ! ('-dir' => $tmpdir, ! UNLINK => ($self->save_tempfiles ? 0 : 1)); ! my $alnout = Bio::AlignIO->new(-format => 'phylip', ! -fh => $tempseqFH, ! -interleaved => 0, ! #-idlinebreak => 1, ! -line_length => 60, ! -wrap_sequential => 1, ! -idlength => $MINNAMELEN > $aln->maxdisplayname_length() ? $MINNAMELEN : $aln->maxdisplayname_length() +1); ! $alnout->write_aln($aln); ! $alnout->close(); ! undef $alnout; ! close($tempseqFH); ! undef $tempseqFH; ! } # now let's print the baseml.ctl file. # many of the these programs are finicky about what the filename is # and won't even run without the properly named file. Ack ! my $baseml_ctl = "$tmpdir/baseml.ctl"; open(BASEML, ">$baseml_ctl") or $self->throw("cannot open $baseml_ctl for writing"); print BASEML "seqfile = $tempseqfile\n"; my $outfile = $self->outfile_name; print BASEML "outfile = $outfile\n"; - my %params = $self->get_parameters; while( my ($param,$val) = each %params ) { next if $param eq 'outfile'; --- 322,396 ---- hash reference same as the previous one except it for the Nei and Gojobori calculated Ka,Ks,omega values ! Args : optionally, a value appropriate for alignment() and one for tree() ! NB : Since Baseml doesn't handle spaces in tree node ids, if a tree is ! in use spaces will be converted to underscores in both the tree node ! ids and alignment sequence ids. =cut ! sub run { ! my ($self, $aln, $tree) = @_; ! $aln = $self->alignment($aln) if $aln; ! $tree = $self->tree($tree) if $tree; ! $aln ||= $self->alignment(); ! $tree ||= $self->tree(); ! ! my %params = $self->get_parameters; if( ! $aln ) { ! $self->warn("must have supplied a valid aligment file in order to run baseml"); return 0; } ! if ((defined $params{runmode} && ($params{runmode} == 0 || $params{runmode} == 1)) && ! $tree) { ! $self->warn("must have supplied a tree in order to run baseml in runmode 0 or 1"); ! return 0; ! } ! ! # replace spaces with underscores in ids, since baseml really doesn't like ! # spaces (actually, the resulting double quotes) in tree ids ! if ($tree) { ! my $changed = 0; ! foreach my $thing ($aln->each_seq, $tree ? $tree->get_leaf_nodes : ()) { ! my $id = $thing->id; ! if ($id =~ / /) { ! $id =~ s/\s+/_/g; ! $thing->id($id); ! $changed = 1; ! } ! } ! if ($changed) { ! my $new_aln = $aln->new; ! foreach my $seq ($aln->each_seq) { ! $new_aln->add_seq($seq); ! } ! $aln = $new_aln; ! $aln = $self->alignment($aln); ! $tree = $self->tree($tree); ! } ! ! # check node and seq names match ! $self->_check_names; ! } ! ! # output the alignment and tree to tempfiles ! my $tempseqfile = $self->_write_alignment('phylip', ! -interleaved => 0, ! -idlinebreak => 1, ! -line_length => 60, ! -wrap_sequential => 1, ! -idlength => $MINNAMELEN > $aln->maxdisplayname_length() ? $MINNAMELEN : $aln->maxdisplayname_length() +1); ! $tree = $self->_write_tree() if $tree; ! # now let's print the baseml.ctl file. # many of the these programs are finicky about what the filename is # and won't even run without the properly named file. Ack ! my $tmpdir = $self->tempdir(); my $baseml_ctl = "$tmpdir/baseml.ctl"; open(BASEML, ">$baseml_ctl") or $self->throw("cannot open $baseml_ctl for writing"); print BASEML "seqfile = $tempseqfile\n"; + print BASEML "treefile = $tree\n" if $tree; my $outfile = $self->outfile_name; print BASEML "outfile = $outfile\n"; while( my ($param,$val) = each %params ) { next if $param eq 'outfile'; *************** *** 380,383 **** --- 398,402 ---- } close(BASEML); + my ($rc,$parser) = (1); { *************** *** 390,397 **** my @output = ; $exit_status = close(RUN); ! $self->error_string(join('', at output)); ! if( (grep { /\berr(or)?: /io } @output) || !$exit_status ) { ! $self->warn("There was an error - see error_string for the program output"); ! $rc = 0; } eval { --- 409,416 ---- my @output = ; $exit_status = close(RUN); ! $self->error_string(join('', grep { /\berr(or)?: /io } @output)); ! if ($self->error_string || !$exit_status) { ! $self->warn("There was an error - see error_string for the program output"); ! $rc = 0; } eval { *************** *** 403,406 **** --- 422,426 ---- $self->warn($self->error_string); } + chdir($cwd); } *************** *** 411,419 **** } } ! ! unless ( $self->save_tempfiles ) { ! unlink("$baseml_ctl"); ! $self->cleanup(); ! } return ($rc,$parser); } --- 431,435 ---- } } ! return ($rc,$parser); } *************** *** 427,436 **** Args : newvalue (optional) - =cut ! sub error_string{ my ($self,$value) = @_; if( defined $value) { $self->{'error_string'} = $value; } --- 443,452 ---- Args : newvalue (optional) =cut ! sub error_string { my ($self,$value) = @_; if( defined $value) { + chomp($value); $self->{'error_string'} = $value; } *************** *** 453,465 **** sub alignment{ ! my ($self,$aln) = @_; ! if( defined $aln ) { ! if( !ref($aln) || ! $aln->isa('Bio::Align::AlignI') ) { ! $self->warn("Must specify a valid Bio::Align::AlignI object to the alignment function"); ! return undef; ! } ! $self->{'_alignment'} = $aln; ! } ! return $self->{'_alignment'}; } --- 469,479 ---- sub alignment{ ! my $self = shift; ! return $self->_alignment(@_); ! } ! ! sub tree { ! my $self = shift; ! return $self->_tree(@_); } *************** *** 472,476 **** Args : none - =cut --- 486,489 ---- *************** *** 500,503 **** --- 513,517 ---- sub set_parameter{ my ($self,$param,$value) = @_; + if( ! defined $VALIDVALUES{$param} ) { $self->warn("unknown parameter $param will not set unless you force by setting no_param_checks to true"); *************** *** 507,511 **** scalar @{$VALIDVALUES{$param}} > 0 ) { ! unless ( grep {$value} @{ $VALIDVALUES{$param} } ) { $self->warn("parameter $param specified value $value is not recognized, please see the documentation and the code for this module or set the no_param_checks to a true value"); return 0; --- 521,526 ---- scalar @{$VALIDVALUES{$param}} > 0 ) { ! my %allowed = map { $_ => 1 } @{ $VALIDVALUES{$param} }; ! unless ( exists $allowed{$value} ) { $self->warn("parameter $param specified value $value is not recognized, please see the documentation and the code for this module or set the no_param_checks to a true value"); return 0; *************** *** 525,529 **** Returns : none Args : boolean: keep existing parameter values ! =cut --- 540,545 ---- Returns : none Args : boolean: keep existing parameter values ! NB : using this isn't an especially good idea! You don't need to do ! anything to end up using default paramters: hence 'default'! =cut *************** *** 544,548 **** } - =head1 Bio::Tools::Run::Wrapper methods --- 560,563 ---- *************** *** 589,598 **** return $self->{'_basemlparams'}->{'outfile'} = shift @_; } return $self->{'_basemlparams'}->{'outfile'}; } - - - =head2 tempdir --- 604,613 ---- return $self->{'_basemlparams'}->{'outfile'} = shift @_; } + unless (defined $self->{'_basemlparams'}->{'outfile'}) { + $self->{'_basemlparams'}->{'outfile'} = 'mlb'; + } return $self->{'_basemlparams'}->{'outfile'}; } =head2 tempdir From sendu at dev.open-bio.org Thu Aug 16 06:52:19 2007 From: sendu at dev.open-bio.org (Senduran Balasubramaniam) Date: Thu, 16 Aug 2007 10:52:19 +0000 Subject: [Bioperl-guts-l] bioperl-run/Bio/Tools/Run/Phylo PhyloBase.pm, 1.5, 1.6 Message-ID: <200708161052.l7GAqJiA010618@dev.open-bio.org> Update of /home/repository/bioperl/bioperl-run/Bio/Tools/Run/Phylo In directory dev.open-bio.org:/tmp/cvs-serv10593/Bio/Tools/Run/Phylo Modified Files: PhyloBase.pm Log Message: _write_alignment() now allows options to be passed through to AlignIO Index: PhyloBase.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-run/Bio/Tools/Run/Phylo/PhyloBase.pm,v retrieving revision 1.5 retrieving revision 1.6 diff -C2 -d -r1.5 -r1.6 *** PhyloBase.pm 15 Feb 2007 13:39:36 -0000 1.5 --- PhyloBase.pm 16 Aug 2007 10:52:16 -0000 1.6 *************** *** 101,110 **** desired format to a temp file. Returns : filename ! Args : string to desribe format (default 'fasta') =cut sub _write_alignment { ! my ($self, $format) = @_; my $align = $self->_alignment || $self->throw("_write_alignment called when _alignment had not been set"); $format ||= 'fasta'; --- 101,111 ---- desired format to a temp file. Returns : filename ! Args : string to desribe format (default 'fasta'), any other options to pass ! to AlignIO =cut sub _write_alignment { ! my ($self, $format, @options) = @_; my $align = $self->_alignment || $self->throw("_write_alignment called when _alignment had not been set"); $format ||= 'fasta'; *************** *** 112,116 **** my ($tfh, $tempfile) = $self->io->tempfile(-dir => $self->tempdir); ! my $out = Bio::AlignIO->new(-verbose => $self->verbose, '-fh' => $tfh, '-format' => $format); $align->set_displayname_flat; $out->write_aln($align); --- 113,117 ---- my ($tfh, $tempfile) = $self->io->tempfile(-dir => $self->tempdir); ! my $out = Bio::AlignIO->new(-verbose => $self->verbose, '-fh' => $tfh, '-format' => $format, @options); $align->set_displayname_flat; $out->write_aln($align); From bugzilla-daemon at portal.open-bio.org Thu Aug 16 06:58:05 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 16 Aug 2007 06:58:05 -0400 Subject: [Bioperl-guts-l] [Bug 1896] Implement a bioperl-run::Bio::Tools::Run::Phylo::PAML::Evolver module In-Reply-To: Message-ID: <200708161058.l7GAw51B016950@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1896 bix at sendu.me.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from bix at sendu.me.uk 2007-08-16 06:58 EST ------- This seems to be in place. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Aug 16 07:03:45 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 16 Aug 2007 07:03:45 -0400 Subject: [Bioperl-guts-l] [Bug 2347] New: Bio::Tools::Run::Phylo::PAML::Baseml needs polishing and completion Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2347 Summary: Bio::Tools::Run::Phylo::PAML::Baseml needs polishing and completion Product: BioPerl Version: unspecified Platform: All OS/Version: All Status: NEW Severity: normal Priority: P3 Component: bioperl-run AssignedTo: bioperl-guts-l at bioperl.org ReportedBy: bix at sendu.me.uk I have Bio::Tools::Run::Phylo::PAML::Baseml in a 'theoretically' working condition, but after the wrapper has successfully gotten Baseml to run it passes the result file to Bio::Tools::Phylo::PAML which then fails to parse the results correctly. This may be a bug in Bio::Tools::Phylo::PAML or something to do with the kind of result file Bio::Tools::Run::Phylo::PAML::Baseml is generating (and it should generate a different kind). Bio::Tools::Run::Phylo::PAML::Baseml also needs some polish for the docs and tests for it added to t/PAML.t -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From cjfields at dev.open-bio.org Sun Aug 19 20:57:59 2007 From: cjfields at dev.open-bio.org (Christopher John Fields) Date: Mon, 20 Aug 2007 00:57:59 +0000 Subject: [Bioperl-guts-l] bioperl-live/Bio/Index BlastTable.pm,NONE,1.1 Message-ID: <200708200057.l7K0vxqG010242@dev.open-bio.org> Update of /home/repository/bioperl/bioperl-live/Bio/Index In directory dev.open-bio.org:/tmp/cvs-serv10217 Added Files: BlastTable.pm Log Message: Initial commit; index class for -m 9 BLAST output --- NEW FILE: BlastTable.pm --- # $Id: BlastTable.pm,v 1.1 2007/08/20 00:57:56 cjfields Exp $ # # BioPerl module for Bio::Index::BlastTable # # Cared for by Chris Fields # # Copyright Chris Fields # # You may distribute this module under the same terms as perl itself # POD documentation - main docs before the code =head1 NAME Bio::Index::BlastIndex - Indexes tabular Blast reports (-m 9 format) and supports retrieval based on query accession(s) =head1 SYNOPSIS use strict; use Bio::Index::BlastTable; my ($indexfile,$file1,$file2,$query); my $index = Bio::Index::Blast->new(-filename => $indexfile, -write_flag => 1); $index->make_index($file1,$file2); my $data = $index->get_stream($query); my $blast_result = $index->fetch_report($query); print "query is ", $blast_result->query_name, "\n"; while ( my $hit = $blast_result->next_hit ) { print "Name ", $hit->name,"\n"; while ( my $hsp = $hit->next_hsp ) { print "Score ", $hsp->score; } print "\n"; } =head1 DESCRIPTION This object allows one to build an index on a tabular BLAST file (or files) and provide quick access to the blast report for that accession. This also allows for ID parsing using a callback: $inx->id_parser(\&get_id); # make the index $inx->make_index($file_name); # here is where the retrieval key is specified sub get_id { my $line = shift; $line =~ /^>.+gi\|(\d+)/; $1; } The indexer is capable of indexing based on multiple IDs passed back from the callback; this is assuming of course all IDs are unique. Note: for best results 'use strict'. =head1 FEEDBACK =head2 Mailing Lists User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to the Bioperl mailing list. Your participation is much appreciated. bioperl-l at bioperl.org - General discussion http://bioperl.org/wiki/Mailing_lists - About the mailing lists =head2 Reporting Bugs Report bugs to the Bioperl bug tracking system to help us keep track of the bugs and their resolution. Bug reports can be submitted via the web: http://bugzilla.open-bio.org/ =head1 AUTHOR - Jason Stajich Email jason-at-bioperl-dot-org =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ =cut # Let the code begin... package Bio::Index::BlastTable; use strict; use IO::String; use Bio::SearchIO; use base qw(Bio::Index::Abstract Bio::Root::Root); sub _version { return ${Bio::Root::Version::VERSION}; } =head2 new Usage : $index = Bio::Index::Abstract->new( -filename => $dbm_file, -write_flag => 0, -dbm_package => 'DB_File', -verbose => 0); Function: Returns a new index object. If filename is specified, then open_dbm() is immediately called. Bio::Index::Abstract->new() will usually be called directly only when opening an existing index. Returns : A new index object Args : -filename The name of the dbm index file. -write_flag TRUE if write access to the dbm file is needed. -dbm_package The Perl dbm module to use for the index. -verbose Print debugging output to STDERR if TRUE. =cut sub new { my($class, at args) = @_; my $self = $class->SUPER::new(@args); } =head2 Bio::Index::Blast implemented methods =cut =head2 fetch_report Title : fetch_report Usage : my $blastreport = $idx->fetch_report($id); Function: Returns a Bio::SearchIO report object for a specific blast report Returns : Bio::SearchIO Args : valid id =cut sub fetch_report{ my ($self,$id) = @_; my $fh = $self->get_stream($id); my $report = Bio::SearchIO->new(-noclose => 1, -format => 'blasttable', -fh => $fh); return $report->next_result; } =head2 Require methods from Bio::Index::Abstract =cut =head2 _index_file Title : _index_file Usage : $index->_index_file( $file_name, $i ) Function: Specialist function to index BLAST report file(s). Is provided with a filename and an integer by make_index in its SUPER class. Example : Returns : Args : =cut sub _index_file { my( $self, $file, # File name $i, # Index-number of file being indexed ) = @_; my( $begin, # Offset from start of file of the start # of the last found record. ); open(my $BLAST, '<', $file) or $self->throw("cannot open file $file\n"); my $indexpoint = 0; my $lastline = 0; while( <$BLAST> ) { if(m{^#\s+T?BLAST[PNX]} ) { my $len = length $_; $indexpoint = tell($BLAST)-$len; } if(m{^#\s+Query:\s+([^\n]+)}) { foreach my $id ($self->id_parser()->($1)) { $self->debug("id is $id, begin is $indexpoint\n"); $self->add_record($id, $i, $indexpoint); } } } } # shamelessly stolen from Bio::Index::Fasta =head2 id_parser Title : id_parser Usage : $index->id_parser( CODE ) Function: Stores or returns the code used by record_id to parse the ID for record from a string. Useful for (for instance) specifying a different parser for different flavours of blast dbs. Returns \&default_id_parser (see below) if not set. If you supply your own id_parser subroutine, then it should expect a fasta description line. An entry will be added to the index for each string in the list returned. Example : $index->id_parser( \&my_id_parser ) Returns : ref to CODE if called without arguments Args : CODE =cut sub id_parser { my( $self, $code ) =@_; if ($code) { $self->{'_id_parser'} = $code; } return $self->{'_id_parser'} || \&default_id_parser; } =head2 default_id_parser Title : default_id_parser Usage : $id = default_id_parser( $header ) Function: The default Blast Query ID parser for Bio::Index::Blast.pm Returns $1 from applying the regexp /^>\s*(\S+)/ to $header. Returns : ID string Args : a header line string =cut sub default_id_parser { if ($_[0] =~ /^\s*(\S+)/) { return $1; } else { return; } } =head2 Bio::Index::Abstract methods =cut =head2 filename Title : filename Usage : $value = $self->filename(); $self->filename($value); Function: Gets or sets the name of the dbm index file. Returns : The current value of filename Args : Value of filename if setting, or none if getting the value. =head2 write_flag Title : write_flag Usage : $value = $self->write_flag(); $self->write_flag($value); Function: Gets or sets the value of write_flag, which is wether the dbm file should be opened with write access. Returns : The current value of write_flag (default 0) Args : Value of write_flag if setting, or none if getting the value. =head2 dbm_package Usage : $value = $self->dbm_package(); $self->dbm_package($value); Function: Gets or sets the name of the Perl dbm module used. If the value is unset, then it returns the value of the package variable $USE_DBM_TYPE or if that is unset, then it chooses the best available dbm type, choosing 'DB_File' in preference to 'SDBM_File'. Bio::Abstract::Index may work with other dbm file types. Returns : The current value of dbm_package Args : Value of dbm_package if setting, or none if getting the value. =head2 get_stream Title : get_stream Usage : $stream = $index->get_stream( $id ); Function: Returns a file handle with the file pointer at the approprite place This provides for a way to get the actual file contents and not an object WARNING: you must parse the record deliminter *yourself*. Abstract wont do this for you So this code $fh = $index->get_stream($myid); while( <$fh> ) { # do something } will parse the entire file if you do not put in a last statement in, like while( <$fh> ) { /^\/\// && last; # end of record # do something } Returns : A filehandle object Args : string represents the accession number Notes : This method should not be used without forethought =head2 open_dbm Usage : $index->open_dbm() Function: Opens the dbm file associated with the index object. Write access is only given if explicitly asked for by calling new(-write => 1) or having set the write_flag(1) on the index object. The type of dbm file opened is that returned by dbm_package(). The name of the file to be is opened is obtained by calling the filename() method. Example : $index->_open_dbm() Returns : 1 on success =head2 _version Title : _version Usage : $type = $index->_version() Function: Returns a string which identifes the version of an index module. Used to permanently identify an index file as having been created by a particular version of the index module. Must be provided by the sub class Example : Returns : Args : none =head2 _filename Title : _filename Usage : $index->_filename( FILE INT ) Function: Indexes the file Example : Returns : Args : =head2 _file_handle Title : _file_handle Usage : $fh = $index->_file_handle( INT ) Function: Returns an open filehandle for the file index INT. On opening a new filehandle it caches it in the @{$index->_filehandle} array. If the requested filehandle is already open, it simply returns it from the array. Example : $fist_file_indexed = $index->_file_handle( 0 ); Returns : ref to a filehandle Args : INT =head2 _file_count Title : _file_count Usage : $index->_file_count( INT ) Function: Used by the index building sub in a sub class to track the number of files indexed. Sets or gets the number of files indexed when called with or without an argument. Example : Returns : INT Args : INT =head2 add_record Title : add_record Usage : $index->add_record( $id, @stuff ); Function: Calls pack_record on @stuff, and adds the result of pack_record to the index database under key $id. If $id is a reference to an array, then a new entry is added under a key corresponding to each element of the array. Example : $index->add_record( $id, $fileNumber, $begin, $end ) Returns : TRUE on success or FALSE on failure Args : ID LIST =head2 pack_record Title : pack_record Usage : $packed_string = $index->pack_record( LIST ) Function: Packs an array of scalars into a single string joined by ASCII 034 (which is unlikely to be used in any of the strings), and returns it. Example : $packed_string = $index->pack_record( $fileNumber, $begin, $end ) Returns : STRING or undef Args : LIST =head2 unpack_record Title : unpack_record Usage : $index->unpack_record( STRING ) Function: Splits the sting provided into an array, splitting on ASCII 034. Example : ( $fileNumber, $begin, $end ) = $index->unpack_record( $self->db->{$id} ) Returns : A 3 element ARRAY Args : STRING containing ASCII 034 =head2 DESTROY Title : DESTROY Usage : Called automatically when index goes out of scope Function: Closes connection to database and handles to sequence files Returns : NEVER Args : NONE =cut 1; From cjfields at dev.open-bio.org Sun Aug 19 20:58:38 2007 From: cjfields at dev.open-bio.org (Christopher John Fields) Date: Mon, 20 Aug 2007 00:58:38 +0000 Subject: [Bioperl-guts-l] bioperl-live/Bio/SearchIO blasttable.pm, 1.10, 1.11 Message-ID: <200708200058.l7K0wclV010275@dev.open-bio.org> Update of /home/repository/bioperl/bioperl-live/Bio/SearchIO In directory dev.open-bio.org:/tmp/cvs-serv10250 Modified Files: blasttable.pm Log Message: Catch algorithm and version if -m 9 Index: blasttable.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-live/Bio/SearchIO/blasttable.pm,v retrieving revision 1.10 retrieving revision 1.11 diff -C2 -d -r1.10 -r1.11 *** blasttable.pm 14 Jun 2007 14:16:14 -0000 1.10 --- blasttable.pm 20 Aug 2007 00:58:36 -0000 1.11 *************** *** 160,165 **** local $/ = "\n"; local $_; ! while( defined ($_ = $self->_readline) ) { next if /^\#/ || /^\s+$/; my ($qname,$hname, $percent_id, $hsp_len, $mismatches,$gapsm, --- 160,170 ---- local $/ = "\n"; local $_; ! my ($alg, $ver); while( defined ($_ = $self->_readline) ) { + # -m 9 only + if(m{^#\s+((?:\S+?)?BLAST[NPX])\s+(.+)}) { + ($alg, $ver) = ($1, $2); + next; + } next if /^\#/ || /^\s+$/; my ($qname,$hname, $percent_id, $hsp_len, $mismatches,$gapsm, *************** *** 177,181 **** $self->start_element({'Name' => 'Result'}); $self->element({'Name' => 'Result_program', ! 'Data' => $self->program_name}); $self->element({'Name' => 'Result_query-def', 'Data' => $qname}); --- 182,188 ---- $self->start_element({'Name' => 'Result'}); $self->element({'Name' => 'Result_program', ! 'Data' => $alg || $self->program_name}); ! $self->element({'Name' => 'Result_version', ! 'Data' => $ver}) if $ver; $self->element({'Name' => 'Result_query-def', 'Data' => $qname}); From cjfields at dev.open-bio.org Sun Aug 19 21:00:34 2007 From: cjfields at dev.open-bio.org (Christopher John Fields) Date: Mon, 20 Aug 2007 01:00:34 +0000 Subject: [Bioperl-guts-l] bioperl-live/Bio/Tools EUtilities.pm,1.7,1.8 Message-ID: <200708200100.l7K10Y9s010329@dev.open-bio.org> Update of /home/repository/bioperl/bioperl-live/Bio/Tools In directory dev.open-bio.org:/tmp/cvs-serv10304 Modified Files: EUtilities.pm Log Message: add docsum printing convenience method. Index: EUtilities.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-live/Bio/Tools/EUtilities.pm,v retrieving revision 1.7 retrieving revision 1.8 diff -C2 -d -r1.7 -r1.8 *** EUtilities.pm 24 Jul 2007 18:08:46 -0000 1.7 --- EUtilities.pm 20 Aug 2007 01:00:32 -0000 1.8 *************** *** 205,216 **** $self->datatype($type); $self->eutil($eutil); - $response && $self->response($response); - $self->cache_response($cache); - # lazy parsing only implemented for elink and esummary (where returned data ! # can be quite long). Also, no point to parsing lazily when the data is # already in memory in an HTTP::Response object, so turn it off and chunk # the Response object after parsing. ! $lazy = 0 if ($response) || ($eutil ne 'elink' && $eutil ne 'esummary'); # setting parser to 'lazy' mode is permanent (can't reset later) --- 205,214 ---- $self->datatype($type); $self->eutil($eutil); # lazy parsing only implemented for elink and esummary (where returned data ! # can be quite long). Also, no point to parsing lazily when the data is # already in memory in an HTTP::Response object, so turn it off and chunk # the Response object after parsing. ! $response && $self->response($response); ! $self->cache_response($cache); $lazy = 0 if ($response) || ($eutil ne 'elink' && $eutil ne 'esummary'); # setting parser to 'lazy' mode is permanent (can't reset later) *************** *** 806,809 **** --- 804,865 ---- } + =head2 print_DocSums + + Title : print_DocSums + Usage : $docsum->print_DocSums(); + $docsum->print_DocSums(-fh => $fh, -callback => $coderef); + Function : prints item data for all docsums. The default printing method is + each item per DocSum is printed with relevant values if present + in a simple table using Text::Wrap. + Returns : none + Args : [optional] + -file : file to print to + -fh : filehandle to print to (cannot be used concurrently with file) + -cb : coderef to use in place of default print method. This is passed + in a DocSum object; + -wrap : number of columns to wrap default text output to (def = 80) + Note : if -file or -fh are not defined, prints to STDOUT + + =cut + + { + my $DEF_PRINT = sub { + my $ds = shift; + my $string = sprintf("UID: %s\n",$ds->get_id); + # flattened mode + while (my $item = $ds->next_Item('flattened')) { + # not all Items have content, so need to check... + my $content = $item->get_content || ''; + $string .= sprintf("%-20s%s\n",$item->get_name(), + wrap('',' 'x21, ":$content")); + } + $string .= "\n"; + return $string; + }; + + sub print_DocSums { + my $self = shift; + my ($file, $fh, $cb, $wrap) = $self->_rearrange([qw(FILE FH CB WRAP)], @_); + $wrap ||= 80; + if (!$cb) { + eval {use Text::Wrap qw(wrap $columns);}; + $self->throw("Text::Wrap is not available!") if $@; + $Text::Wrap::columns = $wrap; + $cb = $DEF_PRINT; + } else { + $self->throw("Callback must be a code reference") if ref $cb ne 'CODE'; + } + $file ||= $fh; + $self->throw("Have defined both file and filehandle; only use one!") if $file && $fh; + my $io = ($file) ? Bio::Root::IO->new(-input => $file, -flush => 1) : + Bio::Root::IO->new(-flush => 1); # defaults to STDOUT + while (my $ds = $self->next_DocSum) { + my $string = $cb->($ds); + $io->_print($string) if $string; + } + $io->close; + } + } + =head1 Info-related methods *************** *** 1111,1123 **** 'linksets' 'docsums' ! A second argument can also be passed to generate a 'lazy' iterator, ! which loops through and returns objects as they are created (instead ! of creating all data instances up front, then iterating through, ! which is the default). Use of these iterators precludes use of ! rewind() for the time being as we can't guarantee you can rewind(), ! as this depends on whether the data source is seek()able and thus ! 'rewindable'. We will add rewind() support at a later time which ! will work for 'seekable' data. A callback specified using callback() will be used to filter objects --- 1167,1177 ---- 'linksets' 'docsums' + 'histories' ! Note : This function generates a simple coderef that one can use ! independently of the various next_* functions (in fact, the next_* ! functions use lazily created iterators generated via this method, ! while rewind() merely deletes them so they can be regenerated on the ! next call). A callback specified using callback() will be used to filter objects *************** *** 1125,1130 **** normal and lazy iterator types and is the default. If you don't want this, make sure to reset any previously set callbacks via ! reset_callback() (which just deletes the code ref). =cut --- 1179,1201 ---- normal and lazy iterator types and is the default. If you don't want this, make sure to reset any previously set callbacks via ! reset_callback() (which just deletes the code ref). Note that setting ! callback() also changes the behavior of the next_* functions as the ! iterators are generated here (as described above); this is a feature ! and not a bug. + 'Lazy' iterators are considered an experimental feature and may be + modified in the future. A 'lazy' iterator, which loops through and + returns objects as they are created (instead of creating all data + instances up front, then iterating through) is returned if the + parser is set to 'lazy' mode. This mode is only present for elink + and esummary output as they are the two formats parsed which can + generate potentially thousands of individual objects (note efetch + isn't parsed, so isn't counted). Use of rewind() with these + iterators is not supported for the time being as we can't guarantee + you can rewind(), as this depends on whether the data source is + seek()able and thus 'rewindable'. We will add rewind() support at a + later time which will work for 'seekable' data or possibly cached + objects via Storable or BDB. + =cut From cjfields at dev.open-bio.org Sun Aug 19 21:01:47 2007 From: cjfields at dev.open-bio.org (Christopher John Fields) Date: Mon, 20 Aug 2007 01:01:47 +0000 Subject: [Bioperl-guts-l] bioperl-live/Bio/DB EUtilParameters.pm, 1.6, 1.7 EUtilities.pm, 1.43, 1.44 Message-ID: <200708200101.l7K11lIH010405@dev.open-bio.org> Update of /home/repository/bioperl/bioperl-live/Bio/DB In directory dev.open-bio.org:/tmp/cvs-serv10380 Modified Files: EUtilParameters.pm EUtilities.pm Log Message: add convenience method to munge IDs from file if asked (still working on) Index: EUtilParameters.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-live/Bio/DB/EUtilParameters.pm,v retrieving revision 1.6 retrieving revision 1.7 diff -C2 -d -r1.6 -r1.7 *** EUtilParameters.pm 24 Jul 2007 20:22:03 -0000 1.6 --- EUtilParameters.pm 20 Aug 2007 01:01:45 -0000 1.7 *************** *** 103,106 **** --- 103,107 ---- use URI; use HTTP::Request; + use Bio::Root::IO; # eutils only has one hostbase URL *************** *** 187,191 **** my ($retmode) = $self->_rearrange(["RETMODE"], at args); $self->_set_from_args(\@args, ! -methods => [@PARAMS, qw(eutil history correspondence)]); $self->eutil() || $self->eutil('efetch'); # set default retmode if not explicitly set --- 188,192 ---- my ($retmode) = $self->_rearrange(["RETMODE"], at args); $self->_set_from_args(\@args, ! -methods => [@PARAMS, qw(eutil history correspondence id_file)]); $self->eutil() || $self->eutil('efetch'); # set default retmode if not explicitly set *************** *** 222,229 **** my ($self, @args) = @_; # allow automated resetting; must check to ensure that retmode isn't explicitly passed ! my $newmode = $self->_rearrange(["RETMODE"], at args); $self->_set_from_args(\@args, -methods => [@PARAMS, qw(eutil correspondence history)]); # set default retmode if not explicitly passed $self->set_default_retmode unless $newmode; } --- 223,232 ---- my ($self, @args) = @_; # allow automated resetting; must check to ensure that retmode isn't explicitly passed ! my ($newmode,$file) = $self->_rearrange([qw(RETMODE ID_FILE)], at args); $self->_set_from_args(\@args, -methods => [@PARAMS, qw(eutil correspondence history)]); # set default retmode if not explicitly passed $self->set_default_retmode unless $newmode; + $file && $self->id_file($file); + return; } *************** *** 253,261 **** my ($self, @args) = @_; # is there a better way of doing this? probably, but this works... ! my ($retmode) = $self->_rearrange(["RETMODE"], at args); map { defined $self->{"_$_"} && undef $self->{"_$_"} } (@PARAMS, qw(eutil correspondence history_cache request_cache)); $self->_set_from_args(\@args, -methods => [@PARAMS, qw(eutil correspondence history)]); $self->eutil() || $self->eutil('efetch'); $self->set_default_retmode unless $retmode; $self->{'_statechange'} = 1; } --- 256,265 ---- my ($self, @args) = @_; # is there a better way of doing this? probably, but this works... ! my ($retmode,$file) = $self->_rearrange([qw(RETMODE ID_FILE)], at args); map { defined $self->{"_$_"} && undef $self->{"_$_"} } (@PARAMS, qw(eutil correspondence history_cache request_cache)); $self->_set_from_args(\@args, -methods => [@PARAMS, qw(eutil correspondence history)]); $self->eutil() || $self->eutil('efetch'); $self->set_default_retmode unless $retmode; + $file && $self->id_file($file); $self->{'_statechange'} = 1; } *************** *** 494,497 **** --- 498,531 ---- } + =head2 id_file + + Title : id_file + Usage : $p->id_file('_io; + $io->_initialize_io(-input => $file); + my @ids; + while (my $line = $io->_readline) { + chomp $line; + push @ids, $line; + } + $self->_io->close; + $self->id(\@ids); + } + } + =head2 url_base_address *************** *** 577,585 **** } ! ! ! sub exhaust { my $self = shift; ! $self->{'_statechange'} = 0; } --- 611,620 ---- } ! sub _io { my $self = shift; ! if (!defined $self->{'_io'}) { ! $self->{'_io'} = Bio::Root::IO->new(); ! } ! return $self->{'_io'}; } Index: EUtilities.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-live/Bio/DB/EUtilities.pm,v retrieving revision 1.43 retrieving revision 1.44 diff -C2 -d -r1.43 -r1.44 *** EUtilities.pm 24 Jul 2007 18:08:46 -0000 1.43 --- EUtilities.pm 20 Aug 2007 01:01:45 -0000 1.44 *************** *** 647,650 **** --- 647,673 ---- } + =head2 print_DocSums + + Title : print_docsums + Usage : $docsum->print_docsums(); + $docsum->print_docsums(-fh => $fh, -callback => $coderef); + Function : prints item data for all docsums. The default printing method is + each item per DocSum is printed with relevant values if present + in a simple table using Text::Wrap. + Returns : none + Args : [optional] + -file : file to print to + -fh : filehandle to print to (cannot be used concurrently with file) + -cb : coderef to use in place of default print method. This is passed + in a DocSum object; + Note : if -file or -fh are not defined, prints to STDOUT + + =cut + + sub print_DocSums { + my ($self, @args) = @_; + return $self->get_Parser->print_DocSums(@args); + } + =head1 Info-related methods From sendu at dev.open-bio.org Mon Aug 20 06:31:36 2007 From: sendu at dev.open-bio.org (Senduran Balasubramaniam) Date: Mon, 20 Aug 2007 10:31:36 +0000 Subject: [Bioperl-guts-l] bioperl-run/Bio/Tools/Run/Phylo/PAML Codeml.pm, 1.45, 1.46 Message-ID: <200708201031.l7KAVa8i011542@dev.open-bio.org> Update of /home/repository/bioperl/bioperl-run/Bio/Tools/Run/Phylo/PAML In directory dev.open-bio.org:/tmp/cvs-serv11517/Bio/Tools/Run/Phylo/PAML Modified Files: Codeml.pm Log Message: no longer deletes result files before user gets a chance to parse them Index: Codeml.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-run/Bio/Tools/Run/Phylo/PAML/Codeml.pm,v retrieving revision 1.45 retrieving revision 1.46 diff -C2 -d -r1.45 -r1.46 *** Codeml.pm 14 Jun 2007 15:23:09 -0000 1.45 --- Codeml.pm 20 Aug 2007 10:31:33 -0000 1.46 *************** *** 593,600 **** chdir($cwd); } ! unless ( $self->save_tempfiles ) { ! unlink("$codeml_ctl"); ! $self->cleanup(); ! } return ($rc,$parser); } --- 593,597 ---- chdir($cwd); } ! return ($rc,$parser); } From cjfields at dev.open-bio.org Mon Aug 20 07:21:04 2007 From: cjfields at dev.open-bio.org (Christopher John Fields) Date: Mon, 20 Aug 2007 11:21:04 +0000 Subject: [Bioperl-guts-l] bioperl-live/Bio/Index Blast.pm,1.30,1.31 Message-ID: <200708201121.l7KBL4nk011835@dev.open-bio.org> Update of /home/repository/bioperl/bioperl-live/Bio/Index In directory dev.open-bio.org:/tmp/cvs-serv11810 Modified Files: Blast.pm Log Message: speed up indexing, add id_parser() method Index: Blast.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-live/Bio/Index/Blast.pm,v retrieving revision 1.30 retrieving revision 1.31 diff -C2 -d -r1.30 -r1.31 *** Blast.pm 8 Aug 2007 20:38:52 -0000 1.30 --- Blast.pm 20 Aug 2007 11:21:02 -0000 1.31 *************** *** 174,219 **** my $indexpoint = 0; my $lastline = 0; ! while( <$BLAST> ) { ! if( /(T)?BLAST[PNX]/ ) { ! if( @data ) { ! # if we have already read a report ! # then store the data for this report ! # in the CURRENT index ! $self->_process_report($indexpoint, $i, join('', @data)); ! ! } # handle fencepost problem (beginning) ! # by skipping here when empty ! ! # since we are at the beginning of a new report ! # store this begin location for the next index ! $indexpoint = $lastline; ! @data = (); } ! push(@data, $_) if $_; ! $lastline = tell $BLAST; ! } ! # handle fencepost problem (end) ! if( @data ) { ! $self->_process_report($indexpoint, $i, join('', @data)); } } ! sub _process_report { ! my ($self,$begin,$i,$data) = @_; ! if( ! $data ) { ! $self->warn("calling _process_report without a valid data string"); ! return ; } ! # my $id_parser = $self->id_parser; ! my $datal = new IO::String($data); ! my $report = Bio::SearchIO->new->new(-fh => $datal, ! -noclose => 1); ! for (my $result = $report->next_result) { ! my $id = $result->query_name; ! $self->debug("id is $id, begin is $begin\n"); ! $self->add_record($id, $i, $begin); } } --- 174,238 ---- my $indexpoint = 0; my $lastline = 0; ! # fencepost problem: we basically just find the top and the query while( <$BLAST> ) { ! if( /^T?BLAST[PNX]/ ) { ! $indexpoint = tell($BLAST) - length $_; } ! if (/^Query=\s*([^\n]+)$/) { ! foreach my $id ($self->id_parser()->($1)) { ! $self->debug("id is $id, begin is $indexpoint\n"); ! $self->add_record($id, $i, $indexpoint); ! } ! } } } ! # shamelessly stolen from Bio::Index::Fasta ! =head2 id_parser ! ! Title : id_parser ! Usage : $index->id_parser( CODE ) ! Function: Stores or returns the code used by record_id to ! parse the ID for record from a string. Useful ! for (for instance) specifying a different ! parser for different flavours of blast dbs. ! Returns \&default_id_parser (see below) if not ! set. If you supply your own id_parser ! subroutine, then it should expect a fasta ! description line. An entry will be added to ! the index for each string in the list returned. ! Example : $index->id_parser( \&my_id_parser ) ! Returns : ref to CODE if called without arguments ! Args : CODE ! ! =cut ! ! sub id_parser { ! my( $self, $code ) =@_; ! ! if ($code) { ! $self->{'_id_parser'} = $code; } ! return $self->{'_id_parser'} || \&default_id_parser; ! } ! =head2 default_id_parser ! ! Title : default_id_parser ! Usage : $id = default_id_parser( $header ) ! Function: The default Blast Query ID parser for Bio::Index::Blast.pm ! Returns $1 from applying the regexp /^>\s*(\S+)/ ! to $header. ! Returns : ID string ! Args : a header line string ! ! =cut ! ! sub default_id_parser { ! if ($_[0] =~ /^\s*(\S+)/) { ! return $1; ! } else { ! return; } } From jason at dev.open-bio.org Mon Aug 20 17:08:37 2007 From: jason at dev.open-bio.org (Jason Stajich) Date: Mon, 20 Aug 2007 21:08:37 +0000 Subject: [Bioperl-guts-l] bioperl-live/scripts/utilities dbsplit.PLS, 1.8, 1.9 Message-ID: <200708202108.l7KL8blR014486@dev.open-bio.org> Update of /home/repository/bioperl/bioperl-live/scripts/utilities In directory dev.open-bio.org:/tmp/cvs-serv14458 Modified Files: dbsplit.PLS Log Message: get the count right every time Index: dbsplit.PLS =================================================================== RCS file: /home/repository/bioperl/bioperl-live/scripts/utilities/dbsplit.PLS,v retrieving revision 1.8 retrieving revision 1.9 diff -C2 -d -r1.8 -r1.9 *** dbsplit.PLS 4 Jul 2006 22:23:29 -0000 1.8 --- dbsplit.PLS 20 Aug 2007 21:08:35 -0000 1.9 *************** *** 93,97 **** my $scount = 0; while( my $seq = $in->next_seq ) { ! if( ++$scount >= $dbsize && $count ) { $out->close(); undef($out); --- 93,97 ---- my $scount = 0; while( my $seq = $in->next_seq ) { ! if( ++$scount > $dbsize && $count ) { $out->close(); undef($out); *************** *** 99,103 **** $out = new Bio::SeqIO(-format => $outformat, -file => ">$prefix.$count"); ! $scount = 0; } $out->write_seq($seq); --- 99,103 ---- $out = new Bio::SeqIO(-format => $outformat, -file => ">$prefix.$count"); ! $scount = 1; } $out->write_seq($seq); From cjfields at dev.open-bio.org Tue Aug 21 12:14:56 2007 From: cjfields at dev.open-bio.org (Christopher John Fields) Date: Tue, 21 Aug 2007 16:14:56 +0000 Subject: [Bioperl-guts-l] bioperl-live/Bio/Tools/Run RemoteBlast.pm, 1.45, 1.46 Message-ID: <200708211614.l7LGEuRi017715@dev.open-bio.org> Update of /home/repository/bioperl/bioperl-live/Bio/Tools/Run In directory dev.open-bio.org:/tmp/cvs-serv17690 Modified Files: RemoteBlast.pm Log Message: catch ERROR flag Index: RemoteBlast.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-live/Bio/Tools/Run/RemoteBlast.pm,v retrieving revision 1.45 retrieving revision 1.46 diff -C2 -d -r1.45 -r1.46 *** RemoteBlast.pm 1 Jul 2007 22:52:55 -0000 1.45 --- RemoteBlast.pm 21 Aug 2007 16:14:53 -0000 1.46 *************** *** 601,604 **** --- 601,610 ---- $waiting = 0; last; + } elsif ( /ERROR/i ) { + close($TMP); + open(my $ERR, "<$tempfile") or $self->throw("cannot open file $tempfile"); + $self->warn(join("", <$ERR>)); + close $ERR; + return -1; } From cjfields at dev.open-bio.org Tue Aug 21 17:55:27 2007 From: cjfields at dev.open-bio.org (Christopher John Fields) Date: Tue, 21 Aug 2007 21:55:27 +0000 Subject: [Bioperl-guts-l] bioperl-live/Bio/Index Stockholm.pm,NONE,1.1 Message-ID: <200708212155.l7LLtRIN018228@dev.open-bio.org> Update of /home/repository/bioperl/bioperl-live/Bio/Index In directory dev.open-bio.org:/tmp/cvs-serv18203 Added Files: Stockholm.pm Log Message: initial commit; simple index class for Rfam/Pfam alignments --- NEW FILE: Stockholm.pm --- # $Id: Stockholm.pm,v 1.1 2007/08/21 21:55:25 cjfields Exp $ # # BioPerl module for Bio::Index::Stockholm # # Cared for by Chris Fields # # Copyright Chris Fields # # You may distribute this module under the same terms as perl itself # POD documentation - main docs before the code =head1 NAME Bio::Index::Stockholm - Indexes Stockholm format alignments (such as those from Pfam and Rfam. Retrieves raw stream data using the ID or a Bio::SimpleAlign (via Bio::AlignIO) =head1 SYNOPSIS use strict; use Bio::Index::Stockholm; my ($indexfile,$file1,$file2,$query); my $index = Bio::Index::Stockholm->new(-filename => $indexfile, -write_flag => 1); $index->make_index($file1,$file2); # get raw data stream starting at alignment position my $fh = $index->get_stream($query); # fetch individual alignment my $align = $index->fetch_aln($query); # alias for fetch_report my $align = $index->fetch_report($query); # same as above print "query is ", $align->display_id, "\n"; =head1 DESCRIPTION This object allows one to build an index for any file (or files) containing Stockholm alignment format (such as Rfam and Pfam) and provides quick access to the alignment based on the alignment ID. This also allows for ID parsing using a callback: $inx->id_parser(\&get_id); # make the index $inx->make_index($file_name); # here is where the retrieval key is specified sub get_id { my $line = shift; $line =~ /^>.+gi\|(\d+)/; $1; } The indexer is capable of indexing based on multiple IDs passed back from the callback; this is assuming of course all IDs are unique. The default is to use the alignment ID provided for Rfam/Pfam output. Note: for best results 'use strict'. =TODO - allow using an alternative regex for indexing (for instance, the ID instead of AC) =head1 FEEDBACK =head2 Mailing Lists User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to the Bioperl mailing list. Your participation is much appreciated. bioperl-l at bioperl.org - General discussion http://bioperl.org/wiki/Mailing_lists - About the mailing lists =head2 Reporting Bugs Report bugs to the Bioperl bug tracking system to help us keep track of the bugs and their resolution. Bug reports can be submitted via the web: http://bugzilla.open-bio.org/ =head1 AUTHOR - Chris Fields Email cjfields-at-bioperl-dot-org =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ =cut # Let the code begin... package Bio::Index::Stockholm; use strict; use Bio::AlignIO; use base qw(Bio::Index::Abstract Bio::Root::Root); sub _version { return ${Bio::Root::Version::VERSION}; } =head2 new Usage : $index = Bio::Index::Abstract->new( -filename => $dbm_file, -write_flag => 0, -dbm_package => 'DB_File', -verbose => 0); Function: Returns a new index object. If filename is specified, then open_dbm() is immediately called. Bio::Index::Abstract->new() will usually be called directly only when opening an existing index. Returns : A new index object Args : -filename The name of the dbm index file. -write_flag TRUE if write access to the dbm file is needed. -dbm_package The Perl dbm module to use for the index. -verbose Print debugging output to STDERR if TRUE. =cut sub new { my($class, at args) = @_; my $self = $class->SUPER::new(@args); } =head2 Bio::Index::Stockholm implemented methods =cut =head2 fetch_report Title : fetch_report Usage : my $align = $idx->fetch_report($id); Function: Returns a Bio::SimpleAlign object for a specific alignment Returns : Bio::SimpleAlign Args : valid id =cut sub fetch_report{ my ($self,$id) = @_; my $fh = $self->get_stream($id); my $report = Bio::AlignIO->new(-noclose => 1, -format => 'stockholm', -fh => $fh); return $report->next_aln; } =head2 fetch_report Title : fetch_report Usage : my $align = $idx->fetch_report($id); Function: Returns a Bio::SimpleAlign object for a specific alignment Returns : Bio::SimpleAlign Args : valid id Note : alias for fetch_report =cut *fetch_aln = \&fetch_report; =head2 Require methods from Bio::Index::Abstract =cut =head2 _index_file Title : _index_file Usage : $index->_index_file( $file_name, $i ) Function: Specialist function to index report file(s). Is provided with a filename and an integer by make_index in its SUPER class. Example : Returns : Args : =cut sub _index_file { my( $self, $file, # File name $i, # Index-number of file being indexed ) = @_; my( $begin, # Offset from start of file of the start # of the last found record. ); local $/ ="\n"; open(my $BLAST, '<', $file) or $self->throw("cannot open file $file\n"); my $indexpoint = 0; my $lastline = 0; while( <$BLAST> ) { if(m{^#\sSTOCKHOLM} ) { $indexpoint = tell($BLAST)-length $_; $self->debug("Index:$indexpoint\n") } if(m{^#=GF\s+AC\s+(\S[^\n]+)}) { foreach my $id ($self->id_parser()->($1)) { $self->debug("id is $id, begin is $indexpoint\n"); #$self->add_record($id, $i, $indexpoint); } } } } # shamelessly stolen from Bio::Index::Fasta =head2 id_parser Title : id_parser Usage : $index->id_parser( CODE ) Function: Stores or returns the code used by record_id to parse the ID for record from a string. Useful for (for instance) specifying a different parser for different flavours of IDs (for instance, custom stockholm-formated files). Returns \&default_id_parser (see below) if not set. If you supply your own id_parser subroutine, then it should expect a fasta description line. An entry will be added to the index for each string in the list returned. Example : $index->id_parser( \&my_id_parser ) Returns : ref to CODE if called without arguments Args : CODE =cut sub id_parser { my( $se